A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.
This project provides a Cookiecutter data science project template based on an existing project template. This version adds support for luigi tasks instead of using ad-hoc python for data processing as suggested in the original template.
- Python 2.7 or 3.5
- Cookiecutter Python package >= 1.4.0: This can be installed with pip by or conda depending on how you manage your Python packages:
$ pip install cookiecutter
or
$ conda config --add channels conda-forge
$ conda install cookiecutter
cookiecutter https://github.com/ffmmjj/luigi_data_science_project_cookiecutter
pip install -r requirements.txt
make data
make data_clean
py.test tests
The project comes with a final luigi task called FinalTask
in the module src/data_tasks/final.py
.
New tasks must be placed under the directory src/data_tasks/
. The luigi task that generates the final, processed dataset must be added to the list of tasks required by FinalTask since this is the "data sink" processed by luigi when you use the Makefile's data
target.
See the original project for more details on this project structure.