Sieve (Filter)¶
-
class
Sieve(filter_expression)[source]¶ Bases:
spooq2.transformer.transformer.TransformerFilters rows depending on provided filter expression. Only records complying with filter condition are kept.
Examples
>>> transformer = T.Sieve(filter_expression=""" attributes.last_name rlike "^.{7}$" """)
>>> transformer = T.Sieve(filter_expression=""" lower(gender) = "f" """)
Parameters: filter_expression ( str) – A valid PySpark SQL expression which returns a booleanRaises: exceptions.ValueError– filter_expression has to be a valid (Spark)SQL expression provided as a stringNote
The
filter()method is used internally.Note
The Size of the resulting DataFrame is not guaranteed to be equal to the Input DataFrame!
-
transform(input_df)[source]¶ Performs a transformation on a DataFrame.
Parameters: input_df ( pyspark.sql.DataFrame) – Input DataFrameReturns: Transformed DataFrame. Return type: pyspark.sql.DataFrameNote
This method does only take the Input DataFrame as a parameters. All other needed parameters are defined in the initialization of the Transformator Object.
-