Sieve (Filter)

class Sieve(filter_expression)[source]

Filters rows depending on provided filter expression. Only records complying with filter condition are kept.

Examples

>>> transformer = T.Sieve(filter_expression=""" attributes.last_name rlike "^.{7}$" """)
>>> transformer = T.Sieve(filter_expression=""" lower(gender) = "f" """)
Parameters

filter_expression (str) – A valid PySpark SQL expression which returns a boolean

Raises

exceptions.ValueError – filter_expression has to be a valid (Spark)SQL expression provided as a string

Note

The filter() method is used internally.

Note

The Size of the resulting DataFrame is not guaranteed to be equal to the Input DataFrame!

transform(input_df)[source]

Performs a transformation on a DataFrame.

Parameters

input_df (DataFrame) – Input DataFrame

Returns

Transformed DataFrame.

Return type

DataFrame

Note

This method does only take the Input DataFrame as a parameters. Any other needed parameters are defined in the initialization of the Transformator Object.