Extractors

Extractors are used to fetch, extract and convert a source data set into a PySpark DataFrame. Exemplary extraction sources are JSON Files on file systems like HDFS, DBFS or EXT4 and relational database systems via JDBC.

class Extractor[source]

Base Class of Extractor Classes.

name

Sets the __name__ of the class’ type as name, which is essentially the Class’ Name.

Type

str

logger

Shared, class level logger for all instances.

Type

logging.Logger

extract()[source]

Extracts Data from a Source and converts it into a PySpark DataFrame.

Return type

DataFrame

Note

This method does not take ANY input parameters. All needed parameters are defined in the initialization of the Extractor Object.

Create your own Extractor

Please see the Create your own Extractor for further details.