Sieve (Filter)

class Sieve(filter_expression)[source]

Bases: spooq2.transformer.transformer.Transformer

Filters rows depending on provided filter expression. Only records complying with filter condition are kept.


>>> transformer = T.Sieve(filter_expression=""" attributes.last_name rlike "^.{7}$" """)
>>> transformer = T.Sieve(filter_expression=""" lower(gender) = "f" """)
Parameters:filter_expression (str) – A valid PySpark SQL expression which returns a boolean
Raises:exceptions.ValueError – filter_expression has to be a valid (Spark)SQL expression provided as a string


The filter() method is used internally.


The Size of the resulting DataFrame is not guaranteed to be equal to the Input DataFrame!


Performs a transformation on a DataFrame.

Parameters:input_df (pyspark.sql.DataFrame) – Input DataFrame
Returns:Transformed DataFrame.
Return type:pyspark.sql.DataFrame


This method does only take the Input DataFrame as a parameters. All other needed parameters are defined in the initialization of the Transformator Object.