Skip to content

Commit

Permalink
Merge pull request #189 from medema-group/feature/repo-updates
Browse files Browse the repository at this point in the history
Feature/repo updates
  • Loading branch information
adraismawur authored Oct 24, 2024
2 parents a902b61 + 1dd3f95 commit 6dc5113
Show file tree
Hide file tree
Showing 7 changed files with 894 additions and 32 deletions.
36 changes: 36 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
FROM mambaorg/micromamba

WORKDIR /home/$MAMBA_USER

# root level stuff
USER root
RUN apt-get update
RUN apt-get install -y git

RUN mkdir /home/data
RUN chown $MAMBA_USER /home/data

USER $MAMBA_USER

# get source code
RUN git clone https://github.com/medema-group/BiG-SCAPE.git -b feature/repo-updates

WORKDIR /home/$MAMBA_USER/BiG-SCAPE

# set up environment
RUN micromamba install \
# channel
-c conda-forge \
# accept all prompts
-y \
# name of env
-n base \
# get python
-f environment.yml && \
# clean up cache
micromamba clean --all --yes

RUN chmod 777 bigscape.py

ENTRYPOINT [ "/usr/local/bin/_entrypoint.sh", "./bigscape.py" ]
CMD [ "--help" ]
661 changes: 661 additions & 0 deletions LICENSE

Large diffs are not rendered by default.

114 changes: 110 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,118 @@
![Pylint](https://medema-group.github.io/BiG-SCAPE/badges/pylint.svg)


# BiG-SCAPE

**BiG-SCAPE** (Biosynthetic Gene Similarity Clustering and Prospecting Engine) is a software package, written in Python, that constructs sequence similarity networks of Biosynthetic Gene Clusters (BGCs) and groups them into Gene Cluster Families (GCFs). BiG-SCAPE does this by rapidly calculating a distance matrix between gene clusters based on a comparison of their protein domain content, order, copy number and sequence identity.

As input, BiG-SCAPE takes GenBank files from the output of [antiSMASH](https://antismash.secondarymetabolites.org) with BGC predictions, as well as reference BGCs from the [MIBiG repository](https://mibig.secondarymetabolites.org/). As output, BiG-SCAPE generates tab-delimited output files, as well as a rich HTML visualization.

In principle, BiG-SCAPE can also be used on any other gene clusters, such as pathogenicity islands, secretion system-encoding gene clusters, or even whole viral genomes.

If you find BiG-SCAPE useful, please [cite us]() [TODO V2].



## Running BiG-SCAPE

There are a few ways to run BiG-SCAPE, depending on your needs.

### Prerequisites

Software:

- python 3.11 or up
- conda/mamba

### Using Conda/Mamba

These steps use `mamba`.

Clone this repository:

1. `git clone https://github.com/medema-group/BiG-SCAPE`.
2. `cd BiG-SCAPE
3. `mamba env create -f environment.yml`
4. `mamba activate bigscape`
5. `pip install .`

You can now run bigscape anywhere:

`bigscape --help`

### Using Docker
Run BiG-SCAPE through docker using the docker run command:

```sh
docker run \
--volume your_root_data_dir:/home/data \
--detatch=false \
--rm \
--user=$(id -u):$(id -g) \
TODO: DOCKER URL \
# arguments from here are the same as using bigscape.py normally
cluster \
-i /home/data/your_input \
-o /home/data/output_folder \
-p /home/data/pfam_folder/Pfam-A.hmm
```

`your_root_data_dir` must be a parent folder of your input, your Pfam and where you want
to put your output.

For example:

```
/home/example/data
├ /input
| ├ experiment_1/
| | └ sample_1/
| | ├ a.region001.gbk
| | └ a.region002.gbk
| └ unrelated_folder_do_not_use/
├ /output
└ /pfam
└ Pfam-A.hmm
```

Can use a command as such:

```
docker run \
--volume /home/example/data:/home/data \
--detatch=false \
--rm \
--user=$(id -u):$(id -g) \
TODO: DOCKER URL \
cluster \
-i /home/data/input/experiment_1 \
-o /home/data/output/experument_1 \
-p /home/data/pfam/Pfam-A.hmm
```


## Dev instructions

1. pip install -r dev-requirements.txt
2. pre-commit install
3. mkdocs build
4. have fun!
We appreciate any contributions!
In order to setup a development environment, please follow the following steps:

### Using mamba

1. Install (micro)mamba
2. `git clone https://github.com/medema-group/BiG-SCAPE`.
3. `cd BiG-SCAPE`
4. `mamba env create -f environment.yml`
5. `mamba activate bigscape`
6. `pip install . --extra dev`
7. `pre-commit install`

To run BiG-SCAPE:
`python bigscape.py --help`

You can use the `bigscape --help` runnable directly, but this must be re-installed each
time you make a change to the codebase.

To run tests:
`python -m pytest`

When developing, make sure to use the mamba environment you created (called bigscape).
4 changes: 0 additions & 4 deletions bigscape.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,11 @@
PI: Marnix Medema [email protected]
Maintainers:
Jorge Navarro [email protected]
Arjan Draisma [email protected]
Catarina Loureiro [email protected]
Nico Louwen [email protected]
Developers:
Jorge Navarro [email protected]
Satria Kautsar [email protected]
Emmanuel (Emzo) de los Santos [email protected]
Arjan Draisma [email protected]
Catarina Loureiro [email protected]
Nico Louwen [email protected]
Expand Down
22 changes: 0 additions & 22 deletions dev-requirements.txt

This file was deleted.

4 changes: 2 additions & 2 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,14 @@ channels:
- conda-forge
- bioconda
dependencies:
- python=3.9
- python=3.11
- biopython=1.81
- sortedcontainers=2.4.0
- fasttree=2.1.11
- networkx=3.1
- numpy=1.26.0
- pandas=2.1.1
- pyhmmer=0.8.2
- pyhmmer=0.10.14
- scikit-learn=1.3.1
- scipy=1.11.3
- sqlalchemy=2.0.2
Expand Down
85 changes: 85 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
[project]
name ="big-scape"
version = "2.0.0"
description = "Biosynthetic Gene Similarity Clustering and Prospecting Engine"
requires-python = ">=3.11"
license = { file = "LICENSE" }
authors = [
{ name = "Arjan Draisma", email = "[email protected]"},
{ name = "Catarina Loureiro", email = "[email protected]"},
{ name = "Nico Louwen", email = "[email protected]"},
{ name = "Jorge Navarro", email = "[email protected]"},
{ name = "Marnix Medema", email = "[email protected]"}
]

[project.scripts]
bigscape = "big_scape.__main__:main"

[project.optional-dependencies]
dev = [
# testing
"pytest",
"coverage",
"coverage-badge",

# documentation
"mkdocs",
"mkdocstrings-python",

# other tools
"pre-commit",
"anybadge",

# linting
"pylint",

# type stubs (https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports)
"types-psutil",
"networkx-stubs",
"data-science-types",
"types-tqdm",
"types-setuptools"
]

[project.urls]
"Repository" = "https://github.com/medema-group/BiG-SCAPE"
# TODO: documentation

[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

[tool.hatch.build.targets.wheel.force-include]
# no idea why this is needed. I guess it decides the underscore is unnecessary
"big_scape" = "bigscape"

[tool.hatch.envs.bigscape]
type = "conda"
command = "mamba"
environment-file = "environment.yml"

[tool.hatch.envs.dev]
type = "conda"
command = "mamba"
environment-file = "environment.yml"
features = [
"dev"
]

[tool.hatch.envs.hatch-test]
features = [
"dev"
]
type = "conda"
command = "mamba"
environment-file = "environment.yml"
default-args = ["test"]

[tool.coverage.run]
omit = [
"test/*"
]

[tool.black]
line-length = 88
target-version = ["py311"]

0 comments on commit 6dc5113

Please sign in to comment.