spooq.transformer.mapper_transformations.to_timestamp
- to_timestamp(source_column=None, name=None, **kwargs: Any) partial [source]
More robust conversion to TimestampType (or as a formatted string). This method supports following input types:
Unix timestamps in seconds
Unix timestamps in milliseconds
Timestamps in any format supported by Spark
Timestamps in any custom format (via
input_format
)Preceding and/or trailing whitespace
- Parameters
- Keyword Arguments
max_timestamp_sec (int, default -> 4102358400 (=> 2099-12-31 01:00:00)) – Defines the range in which unix timestamps are still considered as seconds (compared to milliseconds)
input_format ([str, Bool], default -> False) – Spooq tries to convert the input string with the provided pattern (via
F.unix_timestamp()
)output_format ([str, Bool], default -> False) – The output can be formatted according to the provided pattern (via
F.date_format()
)min_timestamp_ms (int, default -> -62135514321000 (=> Year 1)) – Defines the overall allowed range to keep the timestamps within Python’s
datetime
library limitsmax_timestamp_ms (int, default -> 253402210800000 (=> Year 9999)) – Defines the overall allowed range to keep the timestamps within Python’s
datetime
library limitsalt_src_cols (str, default -> no coalescing, only source_column) – Coalesce with source_column and columns from this parameter.
cast (T.DataType(), default -> T.TimestampType()) – Applies provided datatype on output column (
.cast(cast)
)
Warning
Timestamps in the range (-inf, -max_timestamp_sec) and (max_timestamp_sec, inf) are treated as milliseconds
There is a time interval (1970-01-01 +- ~2.5 months) where we can not distinguish correctly between s and ms (e.g. 3974400000 would be treated as seconds (2095-12-11T00:00:00) as the value is smaller than MAX_TIMESTAMP_S, but it could also be a valid date in Milliseconds (1970-02-16T00:00:00)
Examples
>>> input_df = spark.createDataFrame( ... [ ... Row(input_key="2020-08-12T12:43:14+0000"), ... Row(input_key="1597069446"), ... Row(input_key="1597069446000"), ... Row(input_key="2020-08-12"), ... ], schema="input_key string" ... ) >>> >>> input_df.select(spq.to_timestamp("input_key")).show(truncate=False) +-------------------+ |2020-08-12 14:43:14| |2020-08-10 16:24:06| |2020-08-10 16:24:06| |2020-08-12 00:00:00| +-------------------+ >>> >>> mapping = [ ... ("original_value", "input_key", spq.as_is), ... ("transformed_value", "input_key", spq.to_timestamp) ... ] >>> output_df = Mapper(mapping).transform(input_df) >>> output_df.show(truncate=False) +------------------------+-------------------+ |original_value |transformed_value | +------------------------+-------------------+ |2020-08-12T12:43:14+0000|2020-08-12 14:43:14| |1597069446 |2020-08-10 16:24:06| |1597069446000 |2020-08-10 16:24:06| |2020-08-12 |2020-08-12 00:00:00| +------------------------+-------------------+
- Returns
This method returns a suitable type depending on how it was called. This ensures compability with Spooq’s mapper transformer - with or without explicit parameters - as well as direct calls via select, withColumn, where, …
- Return type
partial or Column