Changelog

3.0.1 (2021-01-22)

  • [MOD] extended_string_to_timestamp: now keeps milli seconds (no more cast to LongType) for conversion to Timestamp

3.0.0b (2020-12-09)

  • [ADD] Spark 3 support (different handling in tests via only_sparkX decorators)
  • [FIX] fix null types in schema for custom transformations on missing columns
  • [MOD] (BREAKING CHANGE!) set default for ignore_missing_columns of Mapper to False (fails on missing input columns)

2.3.0 (2020-11-23)

  • [MOD] extended_string_to_timestamp: it can now handle unix timestamps in seconds and in milliseconds
  • [MOD] extended_string_to_date: it can now handle unix timestamps in seconds and in milliseconds

2.2.0 (2020-10-02)

  • [MOD] add support for prepending and appending mappings on input dataframe (Mapper)

  • [MOD] add support for custom spark sql functions in mapper without injecting methods

  • [MOD] add support for “on”/”off” and “enabled”/”disabled” in extended_string_to_boolean custom mapper transformations

  • [ADD] new custom mapper transformations:

    • extended_string_to_date
    • extended_string_unix_timestamp_ms_to_date
    • has_value

2.1.1 (2020-09-04)

  • [MOD] drop_rows_with_empty_array flag to allow keeping rows with empty array after explosion

  • [MOD] additional test-cases for extended_string mappings (non string inputs)

  • [FIX] remove STDERR logging, don’t touch root logging level anymore (needs to be done outside spooq to see some lower log levels)

  • [ADD] new custom mapper transformations:

    • extended_string_unix_timestamp_ms_to_timestamp

2.1.0 (2020-08-17)

  • [ADD] Python 3 support

  • [MOD] ignore_missing_columns flag to fail on missing input columns with Mapper transformer (https://github.com/Breaka84/Spooq/pull/6)

  • [MOD] timestamp support for threshold cleaner

  • [ADD] new custom mapper transformations:

    • meters_to_cm
    • unix_timestamp_ms_to_spark_timestamp
    • extended_string_to_int
    • extended_string_to_long
    • extended_string_to_float
    • extended_string_to_double
    • extended_string_to_boolean
    • extended_string_to_timestamp

2.0.0 (2020-05-22)

  • [UPDATE] Upgrade to use Spark 2 (tested for 2.4.3) -> will no longer work for spark 1
  • Breaking changes (severe refactoring)

0.6.2 (2019-05-13)

  • [FIX] Logger writes now to std_out and std_err & logger instance is shared across all spooq instances
  • [FIX] PyTest version locked to 3.10.1 as 4+ broke the tests
  • [MOD] Removes id_function to create names for parameters in test methods (fallback to built-in)
  • [ADD] Change SelectNewestByGroup from string eval to pyspark objects
  • [FIX] json_string is now able to None values

0.6.1 (2019-03-26)

  • [FIX] PassThrough Extractor (input df now defined at instantiation time)
  • [ADD] json_string new custom data type