Merge pull request #189 from medema-group/feature/repo-updates

Feature/repo updates
medema-group · Oct 24, 2024 · 6dc5113 · 6dc5113
2 parents a902b61 + 1dd3f95
commit 6dc5113
Show file tree

Hide file tree

Showing 7 changed files with 894 additions and 32 deletions.
diff --git a/Dockerfile b/Dockerfile
@@ -0,0 +1,36 @@
+FROM mambaorg/micromamba
+
+WORKDIR /home/$MAMBA_USER
+
+# root level stuff
+USER root
+RUN apt-get update
+RUN apt-get install -y git
+
+RUN mkdir /home/data
+RUN chown $MAMBA_USER /home/data
+
+USER $MAMBA_USER
+
+# get source code
+RUN git clone https://github.com/medema-group/BiG-SCAPE.git -b feature/repo-updates
+
+WORKDIR /home/$MAMBA_USER/BiG-SCAPE
+
+# set up environment
+RUN micromamba install \
+    # channel
+    -c conda-forge \
+    # accept all prompts
+    -y \
+    # name of env
+    -n base \
+    # get python
+    -f environment.yml && \
+    # clean up cache
+    micromamba clean --all --yes
+
+RUN chmod 777 bigscape.py
+
+ENTRYPOINT [ "/usr/local/bin/_entrypoint.sh", "./bigscape.py" ]
+CMD [ "--help" ]
diff --git a/LICENSE b/LICENSE
diff --git a/README.md b/README.md
@@ -7,12 +7,118 @@
 ![Pylint](https://medema-group.github.io/BiG-SCAPE/badges/pylint.svg)
 
 
+# BiG-SCAPE
 
+**BiG-SCAPE** (Biosynthetic Gene Similarity Clustering and Prospecting Engine) is a software package, written in Python, that constructs sequence similarity networks of Biosynthetic Gene Clusters (BGCs) and groups them into Gene Cluster Families (GCFs). BiG-SCAPE does this by rapidly calculating a distance matrix between gene clusters based on a comparison of their protein domain content, order, copy number and sequence identity.
+
+As input, BiG-SCAPE takes GenBank files from the output of [antiSMASH](https://antismash.secondarymetabolites.org) with BGC predictions, as well as reference BGCs from the [MIBiG repository](https://mibig.secondarymetabolites.org/). As output, BiG-SCAPE generates tab-delimited output files, as well as a rich HTML visualization.
+
+In principle, BiG-SCAPE can also be used on any other gene clusters, such as pathogenicity islands, secretion system-encoding gene clusters, or even whole viral genomes.
+
+If you find BiG-SCAPE useful, please [cite us]() [TODO V2].
+
+
+
+## Running BiG-SCAPE
+
+There are a few ways to run BiG-SCAPE, depending on your needs.
+
+### Prerequisites
+
+Software:
+
+- python 3.11 or up
+- conda/mamba
+
+### Using Conda/Mamba
+
+These steps use `mamba`.
+
+Clone this repository:
+
+1. `git clone https://github.com/medema-group/BiG-SCAPE`.
+2. `cd BiG-SCAPE
+3. `mamba env create -f environment.yml`
+4. `mamba activate bigscape`
+5. `pip install .`
+
+You can now run bigscape anywhere:
+
+`bigscape --help`
+
+### Using Docker
+Run BiG-SCAPE through docker using the docker run command:
+
+```sh
+docker run \
+    --volume your_root_data_dir:/home/data \
+    --detatch=false \
+    --rm \
+    --user=$(id -u):$(id -g) \
+    TODO: DOCKER URL \
+    # arguments from here are the same as using bigscape.py normally
+    cluster \
+    -i /home/data/your_input \
+    -o /home/data/output_folder \
+    -p /home/data/pfam_folder/Pfam-A.hmm
+```
+
+`your_root_data_dir` must be a parent folder of your input, your Pfam and where you want
+to put your output.
+
+For example:
+
+```
+/home/example/data
+  ├ /input
+  |    ├ experiment_1/
+  |    |  └ sample_1/
+  |    |      ├ a.region001.gbk
+  |    |      └ a.region002.gbk
+  |    └ unrelated_folder_do_not_use/
+  ├ /output
+  └ /pfam
+      └ Pfam-A.hmm
+```
+
+Can use a command as such:
+
+```
+docker run \
+    --volume /home/example/data:/home/data \
+    --detatch=false \
+    --rm \
+    --user=$(id -u):$(id -g) \
+    TODO: DOCKER URL \
+    cluster \
+    -i /home/data/input/experiment_1 \
+    -o /home/data/output/experument_1 \
+    -p /home/data/pfam/Pfam-A.hmm
+```
 
 
 ## Dev instructions
 
-1. pip install -r dev-requirements.txt
-2. pre-commit install
-3. mkdocs build
-4. have fun!
+We appreciate any contributions!
+In order to setup a development environment, please follow the following steps:
+
+### Using mamba
+
+1. Install (micro)mamba
+2. `git clone https://github.com/medema-group/BiG-SCAPE`.
+3. `cd BiG-SCAPE`
+4. `mamba env create -f environment.yml`
+5. `mamba activate bigscape`
+6. `pip install . --extra dev`
+7. `pre-commit install`
+
+To run BiG-SCAPE:
+`python bigscape.py --help`
+
+You can use the `bigscape --help` runnable directly, but this must be re-installed each
+time you make a change to the codebase.
+
+To run tests:
+`python -m pytest`
+
+When developing, make sure to use the mamba environment you created (called bigscape).
diff --git a/bigscape.py b/bigscape.py
@@ -6,15 +6,11 @@
 PI: Marnix Medema                   [email protected]
 
 Maintainers:
-Jorge Navarro                       [email protected]
 Arjan Draisma                       [email protected]
 Catarina Loureiro                   [email protected]
 Nico Louwen                         [email protected]
 
 Developers:
-Jorge Navarro                       [email protected]
-Satria Kautsar                      [email protected]
-Emmanuel (Emzo) de los Santos       [email protected]
 Arjan Draisma                       [email protected]
 Catarina Loureiro                   [email protected]
 Nico Louwen                         [email protected]

diff --git a/dev-requirements.txt b/dev-requirements.txt
diff --git a/environment.yml b/environment.yml
@@ -3,14 +3,14 @@ channels:
   - conda-forge
   - bioconda
 dependencies:
-  - python=3.9
+  - python=3.11
   - biopython=1.81
   - sortedcontainers=2.4.0
   - fasttree=2.1.11
   - networkx=3.1
   - numpy=1.26.0
   - pandas=2.1.1
-  - pyhmmer=0.8.2
+  - pyhmmer=0.10.14
   - scikit-learn=1.3.1
   - scipy=1.11.3
   - sqlalchemy=2.0.2

diff --git a/pyproject.toml b/pyproject.toml
@@ -0,0 +1,85 @@
+[project]
+name ="big-scape"
+version = "2.0.0"
+description = "Biosynthetic Gene Similarity Clustering and Prospecting Engine"
+requires-python = ">=3.11"
+license = { file = "LICENSE" }
+authors = [
+    { name = "Arjan Draisma", email = "[email protected]"},
+    { name = "Catarina Loureiro", email = "[email protected]"},
+    { name = "Nico Louwen", email = "[email protected]"},
+    { name = "Jorge Navarro", email = "[email protected]"},
+    { name = "Marnix Medema", email = "[email protected]"}
+]
+
+[project.scripts]
+bigscape = "big_scape.__main__:main"
+
+[project.optional-dependencies]
+dev = [
+    # testing
+    "pytest",
+    "coverage",
+    "coverage-badge",
+
+    # documentation
+    "mkdocs",
+    "mkdocstrings-python",
+
+    # other tools
+    "pre-commit",
+    "anybadge",
+
+    # linting
+    "pylint",
+
+    # type stubs (https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports)
+    "types-psutil",
+    "networkx-stubs",
+    "data-science-types",
+    "types-tqdm",
+    "types-setuptools"
+]
+
+[project.urls]
+"Repository" = "https://github.com/medema-group/BiG-SCAPE"
+# TODO: documentation
+
+[build-system]
+requires = ["hatchling"]
+build-backend = "hatchling.build"
+
+[tool.hatch.build.targets.wheel.force-include]
+# no idea why this is needed. I guess it decides the underscore is unnecessary
+"big_scape" = "bigscape"
+
+[tool.hatch.envs.bigscape]
+type = "conda"
+command = "mamba"
+environment-file = "environment.yml"
+
+[tool.hatch.envs.dev]
+type = "conda"
+command = "mamba"
+environment-file = "environment.yml"
+features = [
+    "dev"
+]
+
+[tool.hatch.envs.hatch-test]
+features = [
+    "dev"
+]
+type = "conda"
+command = "mamba"
+environment-file = "environment.yml"
+default-args = ["test"]
+
+[tool.coverage.run]
+omit = [
+    "test/*"
+]
+
+[tool.black]
+line-length = 88
+target-version = ["py311"]