spooq.transformer.mapper_transformations.apply

apply(source_column=None, name=None, **kwargs: Any) partial[source]

Applies a function / partial

Parameters
  • source_column (str or Column) – Input column. Can be a name, pyspark column or pyspark function

  • name (str, default -> derived from input column) – Name of the output column. (.alias(name))

Keyword Arguments
  • func (Callable) – Function that takes the source column as single argument

  • alt_src_cols (str, default -> no coalescing, only source_column) – Coalesce with source_column and columns from this parameter.

  • cast (T.DataType(), default -> no casting) – Applies provided datatype on output column (.cast(cast))

Examples

>>> input_df = spark.createDataFrame(
...     [
...         ("F", ),
...         ("f", ),
...         ("x", ),
...         ("X", ),
...         ("m", ),
...         ("M", ),
...     ], schema="input_key string"
... )
>>>
>>> input_df.select(spq.apply("input_key", func=F.lower)).show(truncate=False)
+---------+
|f        |
|f        |
|x        |
|x        |
|m        |
|m        |
+---------+
>>>
>>> mapping = [
...     ("original_value",    "input_key", spq.as_is),
...     ("transformed_value", "input_key", spq.apply(func=F.lower))
... ]
>>> output_df = Mapper(mapping).transform(input_df)
>>> output_df.show(truncate=False)
+--------------+-----------------+
|original_value|transformed_value|
+--------------+-----------------+
|F             |f                |
|f             |f                |
|x             |x                |
|X             |x                |
|m             |m                |
|M             |m                |
+--------------+-----------------+
>>> input_df = spark.createDataFrame(
...     [
...         ("sarajishvilileqso@gmx.at", ),
...         ("jnnqn@astrinurdin.art", ),
...         ("321aw@hotmail.com", ),
...         ("techbrenda@hotmail.com", ),
...         ("sdsxcx@gmail.com", ),
...     ], schema="input_key string"
... )
...
>>> def _has_hotmail(source_column):
...     return F.when(
...         source_column.cast(T.StringType()).endswith("@hotmail.com"),
...         F.lit(True)
...     ).otherwise(F.lit(False))
...
>>> mapping = [
...     ("original_value", "input_key", spq.as_is),
...     ("over_sixty",     "input_key", spq.apply(func=_has_hotmail))
... ]
>>> output_df = Mapper(mapping).transform(input_df)
>>> output_df.show(truncate=False)
+------------------------+----------+
|original_value          |over_sixty|
+------------------------+----------+
|sarajishvilileqso@gmx.at|false     |
|jnnqn@astrinurdin.art   |false     |
|321aw@hotmail.com       |true      |
|techbrenda@hotmail.com  |true      |
|sdsxcx@gmail.com        |false     |
+------------------------+----------+
Returns

This method returns a suitable type depending on how it was called. This ensures compability with Spooq’s mapper transformer - with or without explicit parameters - as well as direct calls via select, withColumn, where, …

Return type

partial or Column