This repository provides an interface for solving motif scaffolding problems with an unconditional diffusion model as a prior. It defines inverse problems for a variety of tasks and solves them by sampling the posterior via sequential Monte Carlo. This permits conditional sampling without additional training to the chosen unconditional model.
The following are the supported tasks, samplers, and likelihood formalisations for conditioning on a motif.
- (Single-) motif scaffolding
- Multi-motif scaffolding
- Symmetric motif scaffolding
- Inpainting/scaffolding with degrees of freedom
- Bootstrap Particle Filter
- SMCDiff (Trippe et al., 2023)
- Filtering Posterior Sampling (Dou & Song, 2024)
- Twisted Diffusion Sampler (Wu et al., 2023)
- Monte Carlo Guided Diffusion (Cardoso et al., 2023)
- Masking
- Condition on a partial view of the backbone with a fixed orientation.
- Distance
- Condition on pairwise residue distances.
- Frame distance
- Condition on pairwise residue distances and rotation matrix deviations from the frame representation.
- Frame-aligned point error
- Condition on several partial views of the backbone, each aligned to meet the motif at a different residue index.
- (Root mean) squared deviation
- Condition on a partial view of the backbone, oriented to have minimal deviation from the motif.
Additionally, other unconditional models can be supported by creating an adapter for them in model
, registering them and their parameters, and adding a config in config/model
. Currently, we have Genie-SCOPe-128 and Genie-SCOPe-256 available. The conditional samplers assume the frame representation of the protein, so some extra engineering may be required for other models.
To match our setup, use Python 3.9 with CUDA version 11.8 and above. First, pip install the requirements.
pip install -r requirements.txt
Then, initialise the submodules if not already setup.
git submodule update --init
Finally, but optionally, we use the insilico design pipeline from AQLaboratory for evaluation. Run their bash scripts for installing TMScore, ProteinMPNN, and ESMFold to set up the self-consistency pipeline.
.
├── conditional/ # Posterior samplers
│ ├── __init__.py # Sampler registration
│ ├── components/ # Reusable components
│ │ ├── observation_generator.py # for generating y sequence
│ │ └── particle_filter.py # for filtering
│ ├── bpf.py # Bootstrap particle filter
│ ├── fpssmc.py # Filtering posterior sampling
│ ├── smcdiff.py # SMCDiff
│ ├── tds.py # Twisted diffusion sampler
│ └── wrapper.py # Abstract class for samplers
├── config/ # Configs
│ ├── config.yaml # Main config file
│ ├── experiments/ # Experiment config groups
│ │ └── ...
│ └── model/ # Model config groups
│ └── ...
├── data/ # Motif scaffolding data
│ ├── motif_problems/ # RFDiffusion benchmark
│ │ └── ...
│ ├── multi_motif_problems/ # Genie2 benchmark
│ │ └── ...
│ └── symmetric_motif_problems/ # RFDiffusion trimeric covid binder
│ └── ...
├── experiments/ # Experiments
│ ├── __init__.py # Experiment registration
│ └── experiments.py # Experiment definitions
├── model/ # Diffusion model
│ ├── __init__.py # Model registration
│ ├── diffusion.py # Abstract class for diffusion models
│ └── genie.py # Genie adapter
├── multirun/ # Output of multirun/sweeping experiments
│ └── ...
├── outputs/ # Output of experiments
│ └── ...
├── protein/ # Protein-related functions
│ └── frames.py # Abstract class for frames
├── scripts/ # Scripts for config generation, etc.
│ └── ...
├── submodules/ # Git submodules
│ └── ...
├── utils/ # Utility functions
│ ├── path.py # for resolving paths
│ ├── pdb.py # for working with PDBs
│ ├── registry.py # for handling registrations
│ ├── resampling.py # for low-variance resampling
│ └── symmetry.py # for dealing with symmetry
└── main.py # Main entry point
The project uses the Hydra framework for handling different experimental setups. Configuration files and groups are defined under the config
folder.
To get started, the following command will show the available options for config groups, e.g. an experiment type, as well as the default parameters set.
python3 main.py --help
Configured through the option experiment={experiment_name}
. Check their arguments and defaults in config/experiments
.
Experiment | Description |
---|---|
sample_unconditional |
Sample unconditional samples from the diffusion model. Total length must be specified. |
sample_given_motif |
Sample conditioned on a motif being present in the samples. Motif config files have specifications like in RFDiffusion. |
sample_given_multiple_motifs |
Sample conditioned on multiple motifs being present on the samples. Motif config files have specifications like in Genie2. |
sample_given_symmetry |
Sample conditioned on the samples following a point symmetry. |
sample_given_motif_and_symmetry |
Sample conditioned on a motif being present in the samples and them following a point symmetry. Motif specification is for a single monomer. |
evaluate_samples |
Evaluate motif scaffolding results using insilico design pipeline. |
Sample 16 proteins with 96 residues each using unconditional model Genie-SCOPe-128 (default if unspecified) on GPU device #1.
python3 main.py experiment=sample_unconditional \
experiment.n_samples=16 \
experiment.sample_length=96 \
model=genie-scope-128 \
model.device=cuda:1
Scaffold motif problem 3IXT using TDS with masking likelihood, twist scale=2.0, and K=8 particles.
python3 main.py experiment=sample_given_motif \
experiment/motif=3IXT \
experiment/conditional_method=tds-mask \
experiment.conditional_method.twist_scale=2.0 \
experiment.n_samples=8
Produce 16 scaffolds for motif problem 1PRW using TDS with distance likelihood and K=8 particles.
python3 main.py experiment=sample_given_motif \
experiment/motif=1PRW \
experiment/conditional_method=tds-distance \
experiment.conditional_method.n_batches=16
experiment.n_samples=128
Scaffold motif problem 5TPN, allowing the motif to be placed anywhere, using TDS with masking likelihood and K=8 particles.
python3 main.py experiment=sample_given_motif \
experiment/motif=5TPN \
experiment.fixed_motif=False \
experiment/conditional_method=tds-mask \
experiment.n_samples=8
Scaffold multi-motif problem 1PRW_two using TDS with frame-based distance likelihood and K=8 particles.
python3 main.py experiment=sample_given_multiple_motifs \
experiment/multi_motif=1PRW_two \
experiment/conditional_method=tds-frame-distance \
experiment.n_samples=8 \
model=genie-scope-256
Sample a monomer with 250 residues and C-5 internal symmetry using FPS-SMC with K=16 particles.
python3 main.py experiment=sample_given_symmetry \
model=genie-scope-256 \
experiment.sample_length=250 \
experiment.symmetry=C-5 \
experiment/conditional_method=fpssmc \
experiment.n_samples=16
Evaluate samples from unconditional, single-motif scaffolding, or multi-motif scaffolding experiments using the insilico design pipeline with CUDA visible devices #0, #2, and #3
python3 main.py experiment=evaluate_samples \
experiment.path_to_experiment=<path_to_hydra_output_folder> \
experiment.gpu_devices=\[0, 2, 3\]
Each of the conditional methods and models have their own default hyperparameters which can be overwritten in the command-line. Check out their config files for more info. Custom motifs can also be scaffolded by creating a config file in configs/experiment/motif
following the specification of configs in that directory. The case is similar with multiple motifs, except they are stored in configs/experiment/multi_motif
.