spooq.transformer.mapper_transformations.to_json_string

to_json_string(source_column=None, name=None, **kwargs: Any) partial[source]

Returns a column as json compatible string. Nested hierarchies are supported. This function also supports NULL and strings as input in comparison to Spark’s built-in to_json. The unicode representation of a column will be returned if an error occurs.

Parameters
  • source_column (str or Column) – Input column. Can be a name, pyspark column or pyspark function

  • name (str, default -> derived from input column) – Name of the output column. (.alias(name))

Keyword Arguments
  • alt_src_cols (str, default -> no coalescing, only source_column) – Coalesce with source_column and columns from this parameter.

  • cast (T.DataType(), default -> no casting, same return data type as input data type) – Applies provided datatype on output column (.cast(cast))

Examples

>>> input_df = spark.createDataFrame([
...     Row(friends=[Row(first_name="Gianni", id=3993, last_name="Weber"),
...                  Row(first_name="Arielle", id=17484, last_name="Greaves")]),
... ])
>>>
>>> input_df.select(spq.to_json_string("friends")).show(truncate=False)
+----------------------------------------------------------------------------------------------------------------------------+
|[{"first_name": "Gianni", "id": 3993, "last_name": "Weber"}, {"first_name": "Arielle", "id": 17484, "last_name": "Greaves"}]|
+----------------------------------------------------------------------------------------------------------------------------+
>>>
>>> mapping = [("friends_json", "friends", spq.to_json_string)]
>>> output_df = Mapper(mapping).transform(input_df)
>>> output_df.show(truncate=False)
+----------------------------------------------------------------------------------------------------------------------------+
|friends_json                                                                                                                |
+----------------------------------------------------------------------------------------------------------------------------+
|[{"first_name": "Gianni", "id": 3993, "last_name": "Weber"}, {"first_name": "Arielle", "id": 17484, "last_name": "Greaves"}]|
+----------------------------------------------------------------------------------------------------------------------------+
>>> input_df.select(spq.to_json_string("friends.first_name")).show(truncate=False)
+---------------------+
|first_name           |
+---------------------+
|['Gianni', 'Arielle']|
+---------------------+
Returns

This method returns a suitable type depending on how you called it. This ensures compability with Spooq’s mapper transformer - with or without explicit parameters as well as direct calls via select, withColumn, where, …

Return type

partial or Column