Sieve (Filter)
- class Sieve(filter_expression)[source]
Filters rows depending on provided filter expression. Only records complying with filter condition are kept.
Examples
>>> transformer = T.Sieve(filter_expression=""" attributes.last_name rlike "^.{7}$" """)
>>> transformer = T.Sieve(filter_expression=""" lower(gender) = "f" """)
- Parameters
filter_expression (
str
) – A valid PySpark SQL expression which returns a boolean- Raises
exceptions.ValueError – filter_expression has to be a valid (Spark)SQL expression provided as a string
Note
The
filter()
method is used internally.Note
The Size of the resulting DataFrame is not guaranteed to be equal to the Input DataFrame!
- transform(input_df)[source]
Performs a transformation on a DataFrame.
- Parameters
input_df (
DataFrame
) – Input DataFrame- Returns
Transformed DataFrame.
- Return type
Note
This method does only take the Input DataFrame as a parameters. Any other needed parameters are defined in the initialization of the Transformator Object.