Extractors

Extractors are used to fetch, extract and convert a source data set into a PySpark DataFrame. Exemplary extraction sources are JSON Files on file systems like HDFS, DBFS or EXT4 and relational database systems via JDBC.

class Extractor[source]

Bases: object

Base Class of Extractor Classes.

name

Sets the __name__ of the class’ type as name, which is essentially the Class’ Name.

Type:str
logger

Shared, class level logger for all instances.

Type:logging.Logger
extract()[source]

Extracts Data from a Source and converts it into a PySpark DataFrame.

Returns:
Return type:pyspark.sql.DataFrame

Note

This method does not take ANY input parameters. All needed parameters are defined in the initialization of the Extractor Object.

Class Diagram of Extractor Subpackage

@startuml

skinparam monochrome true
skinparam defaultFontname Bitstream Vera Sans Mono
skinparam defaultFontSize 18

' left to right direction


' namespace spooq2 {
  namespace extractor {
      
      class Extractor {
        .. derived ..
        name : str
        logger : logging.logger
        __
        extract()
      }
      Extractor <|-- JSONExtractor
      class JSONExtractor{
        input_path : str
        base_path : str
        partition : str
        .. derived ..
        spark : SparkSession
        __
        extract()
      }
      Extractor <|-- JDBCExtractor
      class JDBCExtractor{
        jdbc_options : dict
        cache : bool = True
        .. derived ..
        spark : SparkSession
        __
        extract()
      }
      JDBCExtractor <|-- JDBCExtractorFullLoad
      class JDBCExtractorFullLoad {
        query : str
        __
        extract()
      }
      JDBCExtractor <|-- JDBCExtractorIncremental
      class JDBCExtractorIncremental {
        partition : str 
        source_table : str 
        spooq2_values_table : str 
        spooq2_values_db : str = "spooq2_values"
        spooq2_values_partition_column : str = "updated_at"
        __
        extract()
      }

  }
' }
 @enduml

Create your own Extractor

Please see the Create your own Extractor for further details.