spooq.transformer.mapper_transformations.to_bool

to_bool(source_column=None, name=None, **kwargs: Any) partial[source]

More robust conversion to BooleanType. This method is able to additionally handle (compared to implicit Spark conversion):

  • Preceding and/or trailing whitespace

  • Define additional strings for true/false values (“on”/”off”, “enabled”/”disabled” are added by default)

Parameters
  • source_column (str or Column) – Input column. Can be a name, pyspark column or pyspark function

  • name (str, default -> derived from input column) – Name of the output column. (.alias(name))

Keyword Arguments
  • case_sensitive (Bool, default -> False) – Defines whether the case for the additional true/false lookup values is considered

  • true_values (list, default -> ["on", "enabled"]) – A list of values that should result in a True value if they are found in the source column

  • false_values (list, default -> ["off", "disabled"]) – A list of values that should result in a False value if they are found in the source column

  • replace_default_values (Bool, default -> False) – Defines whether additionally provided true/false values replace or extend the default list

  • alt_src_cols (str, default -> no coalescing, only source_column) – Coalesce with source_column and columns from this parameter.

  • cast (T.DataType(), default -> T.BooleanType()) – Applies provided datatype on output column (.cast(cast))

Warning

Spark (and Spooq) handles number to boolean conversions depending on the input datatype! Please see this table for clarification:

Input

Result

Value

Datatype

Cast to Boolean

spq.to_bool

-1

int

True

NULL

-1

str

NULL

NULL

0

int

False

False

0

str

False

False

1

int

True

True

1

str

True

True

100

int

True

NULL

100

str

NULL

NULL

Examples

>>> input_df = spark.createDataFrame(
...     [
...         Row(input_string="  false "),
...         Row(input_string="123"),
...         Row(input_string="1"),
...         Row(input_string="Enabled"),
...         Row(input_string="?"),
...         Row(input_string="n")
...     ], schema="input_key string"
... )
>>>
>>> input_df.select(spq.to_bool("input_key", false_values=["?"])).show(truncate=False)
+---------+
|false    |
|null     |
|true     |
|false    |
|false    |
+---------+
>>>
>>> mapping = [
...     ("original_value",    "input_key", spq.as_is),
...     ("transformed_value", "input_key", spq.to_bool(false_values=["?"]))
... ]
>>> output_df = Mapper(mapping).transform(input_df)
>>> output_df.show(truncate=False)
+--------------+-----------------+
|original_value|transformed_value|
+--------------+-----------------+
|  false       |false            |
|123           |null             |
|1             |true             |
|Enabled       |true             |
|?             |false            |
|n             |false            |
+--------------+-----------------+
Returns

This method returns a suitable type depending on how it was called. This ensures compability with Spooq’s mapper transformer - with or without explicit parameters - as well as direct calls via select, withColumn, where, …

Return type

partial or Column