Extractors¶
Extractors are used to fetch, extract and convert a source data set into a PySpark DataFrame. Exemplary extraction sources are JSON Files on file systems like HDFS, DBFS or EXT4 and relational database systems via JDBC.
-
class
Extractor
[source]¶ Bases:
object
Base Class of Extractor Classes.
-
logger
¶ Shared, class level logger for all instances.
Type: logging.Logger
-
extract
()[source]¶ Extracts Data from a Source and converts it into a PySpark DataFrame.
Returns: Return type: pyspark.sql.DataFrame
Note
This method does not take ANY input parameters. All needed parameters are defined in the initialization of the Extractor Object.
-
Class Diagram of Extractor Subpackage¶
Create your own Extractor¶
Please see the Create your own Extractor for further details.