Tools and infrastructure for automated compound discovery using Folding@home.
-
Clone the repository and
cd
into repo root:git clone https://github.com/choderalab/fah-xchem.git cd fah-xchem
-
Create a
conda
environment with the required dependencies:conda env create -f environment.yml
If the above process is slow, we recommend using mamba to speed up installation:
mamba env create -f environment.yml
-
Install
fah-xchem
in the environment usingpip
:pip install .
Download molecule and experimental data from CDD and generate an experimental data file for analysis use:
export CDD_VAULT_NUM=<vault-num>
export CDD_VAULT_TOKEN=<vault-token>
FLUORESCENCE_IC50_PROTOCOL_ID=49439
# will take some time; pulls full data export from CDD
fah-xchem -l INFO cdd --data-dir cdd-data/ retrieve-protocol-data --molecules -i $FLUORESCENCE_IC50_PROTOCOL_ID
# next step REQUIRES OpenEye license
export OE_LICENSE=/path/to/oe_license.txt
# merges and transforms data elements pulled from CDD into usable form for downstream analysis
fah-xchem -l INFO cdd --data-dir cdd-data/ generate-experimental-compound-data -i 49439 experimental_compound_data.json
Run transformation and compound free energy analysis, producing results/analysis.json
:
fah-xchem --loglevel INFO \
compound-series analyze \
--experimental-data-file experimental_compound_data.json \
--config-file config.json \
--fah-projects-dir /path/to/projects/ \
--fah-data-dir /path/to/data/SVR314342810/ \
--loglevel INFO \
--nprocs 8
compound-series.json \
/path/to/output-dir/analysis.json
Generate representative snapshots, plots, PDF report, and static site HTML in output directory:
fah-xchem --loglevel INFO \
artifacts generate \
--config-file config.json \
--fragalysis-config fragalysis_config.json \
--fah-projects-dir /path/to/projects/ \
--fah-data-dir /path/to/data/SVR314342810/ \
--website-base-url https://my-bucket.s3.amazonaws.com/site/prefix/ \
--cache-dir results/cache/ \
--nprocs 8 \
/path/to/output-dir/analysis.json \
/path/to/output-dir/
Energies are represented in configuration and internally in units of k T
, except when otherwise indicated.
For energies in kilocalories per mole, the function or variable name should be suffixed with _kcal
.
The compound series is specified as JSON with schema given by the CompoundSeriesAnalysis
model (see fah_xchem.schema.
Some analysis options can be configured in a separate JSON file with schema given by the AnalysisConfig
model. For example,
config.json
{
"min_num_work_values": 10,
"max_binding_free_energy": 0
}
The JSON file is passed on the command line using the --config-file
option.
To upload sprint results to Fragalysis a JSON config file may be supplied. For example,
fragalysis_config.json
{
"run": true,
"ligands_filename": "reliable-transformations-final-ligands.sdf",
"fragalysis_sdf_filename": "compound-set_foldingathome-sprint-X.sdf",
"ref_url": "https://url-link",
"ref_mols": "x00000",
"ref_pdb": "references.zip",
"target_name": "protein-target",
"submitter_name": "Folding@home",
"submitter_email": "[email protected]",
"submitter_institution": "institution-name",
"method": "Sprint X",
"upload_key": "upload-key",
"new_upload": true
}
The JSON file is passed on the command line using the --fragalysis-config
option.
Description of the JSON parameters:
run
: specify whether to run the Fragalysis upload. If set tofalse
the results will not be uploaded (even if the JSON is supplied via the--fragalysis-config
option).ligands_filename
: the name of the SDF file to upload to Fragalysis.fragalysis_sdf_filename
: the name to use for the SDF Fragalysis upload. This will be a copy ofligands_filename
but must be in the formcompound-set_name.sdf
.ref_url
: the url to the post that describes the work e.g. for Sprint 5.ref_mol
: a comma separated list of the fragments that inspired the design of the new molecule (codes as they appear in fragalysis - e.g. x0104_0,x0692_0).ref_pdb
: 1) the name of the protein PDB zipped file to upload, this should be namedreferences.zip
(recommended) or 2) the code to the fragment pdb from fragalysis that should be used (e.g. x0692_0).target_name
: the name of the target protein.submitter_name
: the name of the submitter.submitter_email
: the email address of the submitter.submitter_institution
: the name of the institution that the submitter is associated with.method
: the method by which the results were obtained (e.g. Sprint 5).upload_key
: the unique upload key used to upload to Fragalysis.new_upload
: specifies whether to upload a new set (true
) or to update an existing set (false
).
For more information on the upload format see this forum post.
A unique upload_key
is needed to push to Fragalysis, this can be requested here.
For more information on the entire upload process see this forum post.
Paths to Folding@home project and data directories are passed on the command line. See usage examples above.
This project uses conda to manage the environment.
To set up a conda environment named fah-xchem
with the required dependencies, create the conda environment as described above.
To install fah-xchem
as dev
run:
pip install -e .
pytest
Code formatting with black is enforced via a CI check.
To install black
with conda
, use
conda install black
cd docs
make html
Copyright (c) 2020, Chodera Lab
Project based on the Computational Molecular Science Python Cookiecutter version 1.3.