spooq.transformer.mapper_transformations.str_to_array

str_to_array(source_column=None, name=None, **kwargs: Any) partial[source]

Splits a string into a list (ArrayType).

Parameters
  • source_column (str or Column) – Input column. Can be a name, pyspark column or pyspark function

  • name (str, default -> derived from input column) – Name of the output column. (.alias(name))

Keyword Arguments
  • alt_src_cols (str, default -> no coalescing, only source_column) – Coalesce with source_column and columns from this parameter.

  • cast (T.DataType(), default -> T.StringType()) – Applies provided datatype on the elements of the output array (.cast(T.ArrayType(cast)))

Examples

>>> input_df = spark.createDataFrame(
...     [
...         Row(input_key="[item1,item2,3]"),
...         Row(input_key="item1,it[e]m2,it em3"),
...         Row(input_key="    item1,   item2    ,   item3")
...     ], schema="input_key string"
... )
>>>
>>> input_df.select(spq.str_to_array("input_key")).show(truncate=False)
+------------------------+
|[item1, item2, 3]       |
|[item1, it[e]m2, it em3]|
|[item1, item2, item3]   |
+------------------------+
>>>
>>> mapping = [
...     ("original_value",    "input_key", spq.as_is),
...     ("transformed_value", "input_key", spq.str_to_array)
... ]
>>> output_df = Mapper(mapping).transform(input_df)
>>> output_df.printSchema()
root
 |-- original_value: string (nullable = true)
 |-- transformed_value: array (nullable = true)
 |    |-- element: string (containsNull = true)
>>> output_df.show(truncate=False)
+-------------------------------+------------------------+
|original_value                 |transformed_value       |
+-------------------------------+------------------------+
|[item1,item2,3]                |[item1, item2, 3]       |
|item1,it[e]m2,it em3           |[item1, it[e]m2, it em3]|
|    item1,   item2    ,   item3|[item1, item2, item3]   |
+-------------------------------+------------------------+
Returns

This method returns a suitable type depending on how it was called. This ensures compability with Spooq’s mapper transformer - with or without explicit parameters - as well as direct calls via select, withColumn, where, …

Return type

partial or Column