Extractor Base Class
==============================

.. automodule:: spooq.extractor.extractor
    :no-members:
    :noindex:

.. _custom_extractor:

Create your own Extractor
----------------------------

Let your extractor class inherit from the extractor base class.
This includes the name, string representation and logger attributes from the superclass.

| The only mandatory thing is to provide an `extract()` method which
| **takes**
| => *no input parameters*
| and **returns** a
| => *PySpark DataFrame!*

All configuration and parameterization should be done while initializing the class instance.

Here would be a simple example for a CSV Extractor:

Exemplary Sample Code
^^^^^^^^^^^^^^^^^^^^^^

.. literalinclude:: create_extractor/csv_extractor.py
    :caption: spooq/extractor/csv_extractor.py:
    :language: python


References to include
^^^^^^^^^^^^^^^^^^^^^^^^

.. literalinclude:: create_extractor/init.diff
    :caption: spooq/extractor/__init__.py:
    :language: udiff

Tests
^^^^^^^^^^^^^^^^^^^^^^^^

One of Spooq's features is to provide tested code for multiple data pipelines.
Please take your time to write sufficient unit tests!
You can reuse test data from `tests/data` or create a new schema / data set if needed.
A SparkSession is provided as a global fixture called `spark_session`.

.. literalinclude:: create_extractor/test_csv.py
    :caption: tests/unit/extractor/test_csv.py:
    :language: python

Documentation
^^^^^^^^^^^^^^^^^^^^^^^^

You need to create a `rst` for your extractor
which needs to contain at minimum the `automodule` or the `autoclass` directive.

.. literalinclude:: create_extractor/csv.rst.code
    :caption: docs/source/extractor/csv.rst:
    :language: RST

To automatically include your new extractor in the HTML documentation you need to add it to a `toctree` directive. Just refer to your newly created
`csv.rst` file within the extractor overview page.

.. literalinclude:: create_extractor/overview.diff
    :caption: docs/source/extractor/overview.rst:
    :language: udiff

That should be all!