Loaders

Loaders take a pyspark.sql.DataFrame as an input and save it to a sink.

Each Loader class has to have a load method which takes a DataFrame as single paremter.

Possible Loader sinks can be Hive Tables, Kudu Tables, HBase Tables, JDBC Sinks or ParquetFiles.

Class Diagram of Loader Subpackage

@startuml

skinparam monochrome true
skinparam defaultFontname Bitstream Vera Sans Mono
skinparam defaultFontSize 18


' left to right direction


  namespace loader {
    
    class Loader {
      .. derived ..
      name : str
      logger : logging.logger
      __
      load(input_df : DataFrame)
    }
    Loader <|-- HiveLoader
    class HiveLoader {
      db_name : str
      table_name : str
      partition_definitions : list
      clear_partition : bool = True
      repartition_size : int = 40
      auto_create_table : bool = True
      overwrite_partition_value : bool = True
      .. derived ..
      full_table_name : str
      spark : SparkSession
      __
      load(input_df : DataFrame)
    }
  }
' }
 @enduml

Create your own Loader

Please see the Create your own Loader for further details.