Exploder
- class Exploder(path_to_array='included', exploded_elem_name='elem', drop_rows_with_empty_array=True)[source]
Explodes an array within a DataFrame and drops the column containing the source array.
Examples
>>> transformer = Exploder( >>> path_to_array="attributes.friends", >>> exploded_elem_name="friend", >>> )
- Parameters
path_to_array (
str
, (Defaults to ‘included’)) – Defines the Column Name / Path to the Array. Dropping nested columns is not supported. Although, you can still explode them.exploded_elem_name (
str
, (Defaults to ‘elem’)) – Defines the column name the exploded column will get. This is important to know how to access the Field afterwards. Writing nested columns is not supported. The output column has to be first level.drop_rows_with_empty_array (
bool
, (Defaults to True)) – By default Spark (and Spooq) drops rows which don’t have any elements in the array which is being exploded. To work-around this, set drop_rows_with_empty_array to False.
Warning
Support for nested column:
- path_to_array:
PySpark cannot drop a field within a struct. This means the specific field can be referenced and therefore exploded, but not dropped.
- exploded_elem_name:
If you (re)name a column in the dot notation, is creates a first level column, just with a dot its name. To create a struct with the column as a field you have to redefine the structure or use a UDF.
Note
The
explode()
orexplode_outer()
methods of Spark are used internally, depending on the drop_rows_with_empty_array parameter.Note
The size of the resulting DataFrame is not guaranteed to be equal to the Input DataFrame!
- transform(input_df)[source]
Performs a transformation on a DataFrame.
- Parameters
input_df (
DataFrame
) – Input DataFrame- Returns
Transformed DataFrame.
- Return type
Note
This method does only take the Input DataFrame as a parameters. Any other needed parameters are defined in the initialization of the Transformator Object.