Transformers¶
Transformers take a pyspark.sql.DataFrame
as an input, transform it accordingly
and return a PySpark DataFrame.
Each Transformer class has to have a transform method which takes no arguments and returns a PySpark DataFrame.
Possible transformation methods can be Selecting the most up to date record by id, Exploding an array, Filter (on an exploded array), Apply basic threshold cleansing or Map the incoming DataFrame to at provided structure.
- Exploder
- Sieve (Filter)
- Mapper
- Class
- Activity Diagram
- Available Custom Mapping Methods
- as_is / keep / no_change / without_casting (aliases)
- unix_timestamp_ms_to_spark_timestamp
- extended_string_to_int
- extended_string_to_long
- extended_string_to_float
- extended_string_to_double
- extended_string_to_boolean
- extended_string_to_timestamp
- extended_string_to_date
- extended_string_unix_timestamp_ms_to_timestamp
- extended_string_unix_timestamp_ms_to_date
- meters_to_cm
- has_value
- json_string
- timestamp_ms_to_ms
- timestamp_ms_to_s
- timestamp_s_to_ms
- timestamp_s_to_s
- StringNull
- IntNull
- StringBoolean
- IntBoolean
- TimestampMonth
- Custom Mapping Methods Details
- Threshold-based Cleaner
- Newest by Group (Most current record per ID)
Class Diagram of Transformer Subpackage¶
Create your own Transformer¶
Please see the Create your own Transformer for further details.