Sieve (Filter)¶
-
class
Sieve
(filter_expression)[source]¶ Bases:
spooq2.transformer.transformer.Transformer
Filters rows depending on provided filter expression. Only records complying with filter condition are kept.
Examples
>>> transformer = T.Sieve(filter_expression=""" attributes.last_name rlike "^.{7}$" """)
>>> transformer = T.Sieve(filter_expression=""" lower(gender) = "f" """)
Parameters: filter_expression ( str
) – A valid PySpark SQL expression which returns a booleanRaises: exceptions.ValueError
– filter_expression has to be a valid (Spark)SQL expression provided as a stringNote
The
filter()
method is used internally.Note
The Size of the resulting DataFrame is not guaranteed to be equal to the Input DataFrame!
-
transform
(input_df)[source]¶ Performs a transformation on a DataFrame.
Parameters: input_df ( pyspark.sql.DataFrame
) – Input DataFrameReturns: Transformed DataFrame. Return type: pyspark.sql.DataFrame
Note
This method does only take the Input DataFrame as a parameters. All other needed parameters are defined in the initialization of the Transformator Object.
-