spooq.transformer.mapper_transformations.to_num

to_num(source_column=None, name=None, **kwargs: Any) Union[partial, Column][source]

More robust conversion to number data types (Default: LongType). This method is able to additionally handle (compared to implicit Spark conversion):

  • Preceding and/or trailing whitespace

  • underscores as ‘thousand’ separators

Parameters
  • source_column (str or Column) – Input column. Can be a name, pyspark column or pyspark function

  • name (str, default -> derived from input column) – Name of the output column. (.alias(name))

Keyword Arguments
  • alt_src_cols (str, default -> no coalescing, only source_column) – Coalesce with source_column and columns from this parameter.

  • cast (T.DataType(), default -> T.LongType()) – Applies provided datatype on output column (.cast(cast))

Examples

>>> input_df = spark.createDataFrame(
...     [
...         Row(input_string="  123456 "),
...         Row(input_string="Hello"),
...         Row(input_string="123_456")
...     ], schema="input_key string"
... )
>>>
>>> input_df.select(spq.to_num("input_key")).show(truncate=False)
+---------+
|123456   |
|null     |
|123456   |
+---------+
>>>
>>> mapping = [
...     ("original_value",    "input_key", spq.as_is),
...     ("transformed_value", "input_key", spq.to_num)
... ]
>>> output_df = Mapper(mapping).transform(input_df)
>>> output_df.show(truncate=False)
+--------------+-----------------+
|original_value|transformed_value|
+--------------+-----------------+
|  123456      |123456           |
|Hello         |null             |
|123_456       |123456           |
+--------------+-----------------+
Returns

This method returns a suitable type depending on how it was called. This ensures compability with Spooq’s mapper transformer - with or without explicit parameters - as well as direct calls via select, withColumn, where, …

Return type

partial or Column