spooq.transformer.mapper_transformations.map_values
- map_values(source_column=None, name=None, **kwargs: Any) partial [source]
Maps input values to specified output values.
- Parameters
- Keyword Arguments
mapping (dict) – Dictionary containing lookup / substitute value pairs.
default ([str, Column, Any], default -> "source_column") – Defines what will be returned if no matching lookup value was found.
ignore_case (bool, default -> True) – Only relevant for “equals” and “sql_like” comparison operators.
pattern_type (str, default -> "equals") – Please choose among [‘equals’, ‘regex’ and ‘sql_like’] for the comparison of input value and mapping key.
alt_src_cols (str, default -> no coalescing, only source_column) – Coalesce with source_column and columns from this parameter.
cast (T.DataType(), default -> T.StringType()) – Applies provided datatype on output column (
.cast(cast)
)
Hint
Maybe this table helps you to better understand what happens behind the curtains:
lookup
substitute
mode | Internal Spark Logic
whitelist
allowlist
- “equals” | F.when(
- F.col(“input_column”) == “whitelist”,F.lit(“allowlist”)).otherwise(F.col(“input_column”))
%whitelist%
allowlist
- “sql_like” | F.when(
- F.col(“input_column”).like(“%whitelist%”,F.lit(“allowlist”)).otherwise(F.col(“input_column”))
.*whitelist.*
allowlist
- “regex ” | F.when(
- F.col(“input_column”).rlike(“.*whitelist.*”,F.lit(“allowlist”)).otherwise(F.col(“input_column”))
Examples
>>> input_df = spark.createDataFrame( ... [ ... ("allowlist", ), ... ("WhiteList", ), ... ("blocklist", ), ... ("blacklist", ), ... ("Blacklist", ), ... ("Shoppinglist", ), ... ], schema="input_key string" ... ) >>> substitute_mapping = {"whitelist": "allowlist", "blacklist": "blocklist"} >>> >>> input_df.select(spq.map_values("input_key", mapping=substitute_mapping)).show(truncate=False) +------------+ |allowlist | |allowlist | |blocklist | |blocklist | |blocklist | |Shoppinglist| +------------+ >>> >>> mapping = [ ... ("original_value", "input_key", spq.as_is), ... ("transformed_value", "input_key", spq.map_values(mapping=substitute_mapping)) ... ] >>> output_df = Mapper(mapping).transform(input_df) >>> output_df.show(truncate=False) +--------------+-----------------+ |original_value|transformed_value| +--------------+-----------------+ |allowlist |allowlist | |WhiteList |allowlist | |blocklist |blocklist | |blacklist |blocklist | |Blacklist |blocklist | |Shoppinglist |Shoppinglist | +--------------+-----------------+
- Returns
This method returns a suitable type depending on how it was called. This ensures compability with Spooq’s mapper transformer - with or without explicit parameters - as well as direct calls via select, withColumn, where, …
- Return type
partial or Column