spooq.transformer.mapper_transformations.to_num
- to_num(source_column=None, name=None, **kwargs: Any) Union[partial, Column] [source]
More robust conversion to number data types (Default: LongType). This method is able to additionally handle (compared to implicit Spark conversion):
Preceding and/or trailing whitespace
underscores as ‘thousand’ separators
- Parameters
- Keyword Arguments
alt_src_cols (str, default -> no coalescing, only source_column) – Coalesce with source_column and columns from this parameter.
cast (T.DataType(), default -> T.LongType()) – Applies provided datatype on output column (
.cast(cast)
)
Examples
>>> input_df = spark.createDataFrame( ... [ ... Row(input_string=" 123456 "), ... Row(input_string="Hello"), ... Row(input_string="123_456") ... ], schema="input_key string" ... ) >>> >>> input_df.select(spq.to_num("input_key")).show(truncate=False) +---------+ |123456 | |null | |123456 | +---------+ >>> >>> mapping = [ ... ("original_value", "input_key", spq.as_is), ... ("transformed_value", "input_key", spq.to_num) ... ] >>> output_df = Mapper(mapping).transform(input_df) >>> output_df.show(truncate=False) +--------------+-----------------+ |original_value|transformed_value| +--------------+-----------------+ | 123456 |123456 | |Hello |null | |123_456 |123456 | +--------------+-----------------+
- Returns
This method returns a suitable type depending on how it was called. This ensures compability with Spooq’s mapper transformer - with or without explicit parameters - as well as direct calls via select, withColumn, where, …
- Return type
partial or Column