Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/repo updates #189

Merged
merged 10 commits into from
Oct 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 36 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
FROM mambaorg/micromamba

WORKDIR /home/$MAMBA_USER

# root level stuff
USER root
RUN apt-get update
RUN apt-get install -y git

RUN mkdir /home/data
RUN chown $MAMBA_USER /home/data

USER $MAMBA_USER

# get source code
RUN git clone https://github.com/medema-group/BiG-SCAPE.git -b feature/repo-updates

WORKDIR /home/$MAMBA_USER/BiG-SCAPE

# set up environment
RUN micromamba install \
# channel
-c conda-forge \
# accept all prompts
-y \
# name of env
-n base \
# get python
-f environment.yml && \
# clean up cache
micromamba clean --all --yes

RUN chmod 777 bigscape.py

ENTRYPOINT [ "/usr/local/bin/_entrypoint.sh", "./bigscape.py" ]
CMD [ "--help" ]
661 changes: 661 additions & 0 deletions LICENSE

Large diffs are not rendered by default.

114 changes: 110 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,118 @@
![Pylint](https://medema-group.github.io/BiG-SCAPE/badges/pylint.svg)


# BiG-SCAPE

**BiG-SCAPE** (Biosynthetic Gene Similarity Clustering and Prospecting Engine) is a software package, written in Python, that constructs sequence similarity networks of Biosynthetic Gene Clusters (BGCs) and groups them into Gene Cluster Families (GCFs). BiG-SCAPE does this by rapidly calculating a distance matrix between gene clusters based on a comparison of their protein domain content, order, copy number and sequence identity.

As input, BiG-SCAPE takes GenBank files from the output of [antiSMASH](https://antismash.secondarymetabolites.org) with BGC predictions, as well as reference BGCs from the [MIBiG repository](https://mibig.secondarymetabolites.org/). As output, BiG-SCAPE generates tab-delimited output files, as well as a rich HTML visualization.

In principle, BiG-SCAPE can also be used on any other gene clusters, such as pathogenicity islands, secretion system-encoding gene clusters, or even whole viral genomes.

If you find BiG-SCAPE useful, please [cite us]() [TODO V2].



## Running BiG-SCAPE

There are a few ways to run BiG-SCAPE, depending on your needs.

### Prerequisites

Software:

- python 3.11 or up
- conda/mamba

### Using Conda/Mamba

These steps use `mamba`.

Clone this repository:

1. `git clone https://github.com/medema-group/BiG-SCAPE`.
2. `cd BiG-SCAPE
3. `mamba env create -f environment.yml`
4. `mamba activate bigscape`
5. `pip install .`

You can now run bigscape anywhere:

`bigscape --help`

### Using Docker
Run BiG-SCAPE through docker using the docker run command:

```sh
docker run \
--volume your_root_data_dir:/home/data \
--detatch=false \
--rm \
--user=$(id -u):$(id -g) \
TODO: DOCKER URL \
# arguments from here are the same as using bigscape.py normally
cluster \
-i /home/data/your_input \
-o /home/data/output_folder \
-p /home/data/pfam_folder/Pfam-A.hmm
```

`your_root_data_dir` must be a parent folder of your input, your Pfam and where you want
to put your output.

For example:

```
/home/example/data
├ /input
| ├ experiment_1/
| | └ sample_1/
| | ├ a.region001.gbk
| | └ a.region002.gbk
| └ unrelated_folder_do_not_use/
├ /output
└ /pfam
└ Pfam-A.hmm
```

Can use a command as such:

```
docker run \
--volume /home/example/data:/home/data \
--detatch=false \
--rm \
--user=$(id -u):$(id -g) \
TODO: DOCKER URL \
cluster \
-i /home/data/input/experiment_1 \
-o /home/data/output/experument_1 \
-p /home/data/pfam/Pfam-A.hmm
```


## Dev instructions

1. pip install -r dev-requirements.txt
2. pre-commit install
3. mkdocs build
4. have fun!
We appreciate any contributions!
In order to setup a development environment, please follow the following steps:

### Using mamba

1. Install (micro)mamba
2. `git clone https://github.com/medema-group/BiG-SCAPE`.
3. `cd BiG-SCAPE`
4. `mamba env create -f environment.yml`
5. `mamba activate bigscape`
6. `pip install . --extra dev`
7. `pre-commit install`

To run BiG-SCAPE:
`python bigscape.py --help`

You can use the `bigscape --help` runnable directly, but this must be re-installed each
time you make a change to the codebase.

To run tests:
`python -m pytest`

When developing, make sure to use the mamba environment you created (called bigscape).
4 changes: 0 additions & 4 deletions bigscape.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,11 @@
PI: Marnix Medema [email protected]

Maintainers:
Jorge Navarro [email protected]
Arjan Draisma [email protected]
Catarina Loureiro [email protected]
Nico Louwen [email protected]

Developers:
Jorge Navarro [email protected]
Satria Kautsar [email protected]
Emmanuel (Emzo) de los Santos [email protected]
Arjan Draisma [email protected]
Catarina Loureiro [email protected]
Nico Louwen [email protected]
Expand Down
22 changes: 0 additions & 22 deletions dev-requirements.txt

This file was deleted.

4 changes: 2 additions & 2 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,14 @@ channels:
- conda-forge
- bioconda
dependencies:
- python=3.9
- python=3.11
- biopython=1.81
- sortedcontainers=2.4.0
- fasttree=2.1.11
- networkx=3.1
- numpy=1.26.0
- pandas=2.1.1
- pyhmmer=0.8.2
- pyhmmer=0.10.14
- scikit-learn=1.3.1
- scipy=1.11.3
- sqlalchemy=2.0.2
Expand Down
85 changes: 85 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
[project]
name ="big-scape"
version = "2.0.0"
description = "Biosynthetic Gene Similarity Clustering and Prospecting Engine"
requires-python = ">=3.11"
license = { file = "LICENSE" }
authors = [
{ name = "Arjan Draisma", email = "[email protected]"},
{ name = "Catarina Loureiro", email = "[email protected]"},
{ name = "Nico Louwen", email = "[email protected]"},
{ name = "Jorge Navarro", email = "[email protected]"},
{ name = "Marnix Medema", email = "[email protected]"}
]

[project.scripts]
bigscape = "big_scape.__main__:main"

[project.optional-dependencies]
dev = [
# testing
"pytest",
"coverage",
"coverage-badge",

# documentation
"mkdocs",
"mkdocstrings-python",

# other tools
"pre-commit",
"anybadge",

# linting
"pylint",

# type stubs (https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports)
"types-psutil",
"networkx-stubs",
"data-science-types",
"types-tqdm",
"types-setuptools"
]

[project.urls]
"Repository" = "https://github.com/medema-group/BiG-SCAPE"
# TODO: documentation

[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

[tool.hatch.build.targets.wheel.force-include]
# no idea why this is needed. I guess it decides the underscore is unnecessary
"big_scape" = "bigscape"

[tool.hatch.envs.bigscape]
type = "conda"
command = "mamba"
environment-file = "environment.yml"

[tool.hatch.envs.dev]
type = "conda"
command = "mamba"
environment-file = "environment.yml"
features = [
"dev"
]

[tool.hatch.envs.hatch-test]
features = [
"dev"
]
type = "conda"
command = "mamba"
environment-file = "environment.yml"
default-args = ["test"]

[tool.coverage.run]
omit = [
"test/*"
]

[tool.black]
line-length = 88
target-version = ["py311"]
Loading