Newest by Group (Most current record per ID)¶
-
class
NewestByGroup
(group_by=['id'], order_by=['updated_at', 'deleted_at'])[source]¶ Bases:
spooq2.transformer.transformer.Transformer
Groups, orders and selects first element per group.
Example
>>> transformer = NewestByGroup( >>> group_by=["first_name", "last_name"], >>> order_by=["created_at_ms", "version"] >>> )
Parameters: - group_by (
str
orlist
ofstr
, (Defaults to [‘id’])) – List of attributes to be used within the Window Function as Grouping Arguments. - order_by (
str
orlist
ofstr
, (Defaults to [‘updated_at’, ‘deleted_at’])) – List of attributes to be used within the Window Function as Ordering Arguments. All columns will be sorted in descending order.
Raises: exceptions.AttributeError
– If any Attribute ingroup_by
ororder_by
is not contained in the input DataFrame.Note
PySpark’s
Window
function is used internally The first row (row_number()
) per window will be selected and returned.-
transform
(input_df)[source]¶ Performs a transformation on a DataFrame.
Parameters: input_df ( pyspark.sql.DataFrame
) – Input DataFrameReturns: Transformed DataFrame. Return type: pyspark.sql.DataFrame
Note
This method does only take the Input DataFrame as a parameters. All other needed parameters are defined in the initialization of the Transformator Object.
- group_by (