spooq.transformer.mapper_transformations.str_to_array
- str_to_array(source_column=None, name=None, **kwargs: Any) partial [source]
Splits a string into a list (ArrayType).
- Parameters
- Keyword Arguments
alt_src_cols (str, default -> no coalescing, only source_column) – Coalesce with source_column and columns from this parameter.
cast (T.DataType(), default -> T.StringType()) – Applies provided datatype on the elements of the output array (
.cast(T.ArrayType(cast))
)
Examples
>>> input_df = spark.createDataFrame( ... [ ... Row(input_key="[item1,item2,3]"), ... Row(input_key="item1,it[e]m2,it em3"), ... Row(input_key=" item1, item2 , item3") ... ], schema="input_key string" ... ) >>> >>> input_df.select(spq.str_to_array("input_key")).show(truncate=False) +------------------------+ |[item1, item2, 3] | |[item1, it[e]m2, it em3]| |[item1, item2, item3] | +------------------------+ >>> >>> mapping = [ ... ("original_value", "input_key", spq.as_is), ... ("transformed_value", "input_key", spq.str_to_array) ... ] >>> output_df = Mapper(mapping).transform(input_df) >>> output_df.printSchema() root |-- original_value: string (nullable = true) |-- transformed_value: array (nullable = true) | |-- element: string (containsNull = true) >>> output_df.show(truncate=False) +-------------------------------+------------------------+ |original_value |transformed_value | +-------------------------------+------------------------+ |[item1,item2,3] |[item1, item2, 3] | |item1,it[e]m2,it em3 |[item1, it[e]m2, it em3]| | item1, item2 , item3|[item1, item2, item3] | +-------------------------------+------------------------+
- Returns
This method returns a suitable type depending on how it was called. This ensures compability with Spooq’s mapper transformer - with or without explicit parameters - as well as direct calls via select, withColumn, where, …
- Return type
partial or Column