Contribute
You can either fork Spooq and create a PR on github or get in contact with the authors to get access to the repo.
Spooq was built with extensibility in mind which results in clearly separated and independent modules and classes.
Prerequisites
python 3.8
Java 8+ (jdk8-openjdk)
pipenv
Latex (for PDF documentation)
Setting up the Environment
The requirements are stored in the file Pipfile
separated for production and development packages.
Run the following command to install the packages needed for development and testing:
$ pipenv install --dev
This will create a virtual environment in ~/.local/share/virtualenvs
.
If you want to have your virtual environment installed as a sub-folder (.venv) you have to set the
environment variable PIPENV_VENV_IN_PROJECT
to 1.
To remove a virtual environment created with pipenv just change in the folder where you created it
and execute pipenv --rm
.
Activate the Virtual Environment
$ pipenv shell
$ exit
# or close the shell
For more commands of pipenv call pipenv -h
or got to their documentation
Creating Your Own Components
Implementing new extractors, transformers, or loaders is fairly straightforward. Please refer to following descriptions and examples to get an idea:
Running Tests
The tests are implemented with the pytest framework.
$ pipenv shell
$ cd tests
$ pytest
Test Plugins
Those are the most useful plugins automatically used:
html
$ pytest --html=report.html
random-order
Shuffles the order of execution for the tests to avoid / discover dependencies of the tests.
Randomization is set by a seed number. To re-test the same order of execution where you found
an error, just set the seed value to the same as for the failing test.
To temporarily disable this feature run with pytest -p no:random-order -v
cov
Generates an HTML for the test coverage
$ pytest --cov-report term --cov=spooq
$ pytest --cov-report html:cov_html --cov=spooq
ipdb
To use ipdb (IPython Debugger) add following code at your breakpoint:
>>> import ipdb
>>> ipdb.set_trace()
You have to start pytest with -s
if you want to use interactive debugger.
$ pytest -s
Generate Documentation
This project uses Sphinx for creating its documentation. Graphs and diagrams are produced with PlantUML.
The main documentation content is defined as docstrings within the source code.
To view the current documentation open docs/build/html/index.html
or docs/build/latex/spooq.pdf
in your application of choice.
Although, if you are reading this, you have probably already found the documentation…
Diagrams
For generating the graphs and diagrams, you need a working plantuml installation on your computer! Please refer to sphinxcontrib-plantuml.
HTML
$ cd docs
$ make html
$ chromium build/html/index.html
PDF
For generating documentation in the PDF format you need to have a working (pdf)latex installation on your computer! Please refer to TexLive on how to install TeX Live - a compatible latex distribution. But beware, the download size is huge!
$ cd docs
$ make latexpdf
$ evince build/latex/Spooq.pdf
Configuration
Themes, plugins, settings, … are defined in docs/source/conf.py
.
napoleon
Enables support for parsing docstrings in NumPy / Google Style
intersphinx
Allows linking to other projects’ documentation. E.g., PySpark, Python3
To add an external project, at the documentation link to intersphinx_mapping
in conf.py
recommonmark
This allows you to write CommonMark (Markdown) inside of Docutils & Sphinx projects instead of rst.
plantuml
Allows for inline Plant UML code (uml directive) which is automatically rendered into an svg image and placed in the document. Allows also to source puml-files. for an example.
Release a new Version on PyPi
Things to consider
Version Bump
For any update on PyPi we need a new version number. You can manually edit the file spooq/_version.py to change the version number. This is reflected in the setup.py and consequently in the release version number.
Documentation
Please don’t forget to also update the documentation accordingly. This is either done directly in the source code as docstrings or for more overview-centered topics in the rst file under docs/source.
Changelog
Please add your changes to the CHANGELOG.rst
Automatic Publishing via Github Action
The current Spooq version is automatically published on PyPi after a release on github is created.
Manual Publishing from Command Line
Create the Distribution Files
$ python setup.py sdist bdist_wheel
Upload to Test-PyPi
$ pipenv shell
$ twine upload --repository-url https://test.pypi.org/legacy/ dist/
Your new version is available at https://test.pypi.org/project/Spooq/. Beware, that the test PyPi uses different credentials than the real PyPi. You can get the credentials from your favourite collaborator.
Upload to Real PyPi
$ pipenv shell
$ twine upload dist/
Your new version is available at https://pypi.org/project/Spooq/. You can get the credentials from your favourite collaborator.