open an sdf file #154

maartensandbox · 2023-10-23T23:42:12Z

It is not clear to me how I can generally set up a working system. As a simple example, it is not possible to set up a system starting from an sdf file:

CT1000292221


  3  2  0  0  0               999 V2000
    0.0021   -0.0041    0.0020 H   0  0  0  0  0  0  0  0  0  0  0  0
   -0.0110    0.9628    0.0073 O   0  0  0  0  0  0  0  0  0  0  0  0
    0.8669    1.3681    0.0011 H   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0  0  0  0
  2  3  1  0  0  0  0
M  END
$$$$

System("water.sdf",ff)

straight up fails.

Is there somewhere an example where I start with a practical system (say a compound from https://www.rcsb.org ) and set up an MD simulation?

The text was updated successfully, but these errors were encountered:

jgreener64 · 2023-10-24T12:07:55Z

We use Chemfiles for the file reading and residues are not currently supported there for SDF (https://chemfiles.org/chemfiles/latest/formats.html#list-of-supported-formats), so the setup fails.

If you want to read in data from the PDB you can use PDB format, see the example at https://juliamolsim.github.io/Molly.jl/stable/docs/#Simulating-a-protein, though currently atom names must exactly match the residue templates. This is a topic of future work.

maartensandbox · 2023-10-31T13:21:19Z

And with a PDB file, how can I make the atom names match exactly while also indicating that multiple hydrogen atoms are connected to the same molecule? For example, if we follow the last example from https://www.biostat.jhsph.edu/~iruczins/teaching/260.655/links/pdbformat.pdf we see that certain hydrogen atoms will be named as "1HG1". I take it that's not supported, and I should manually place the atoms and fix up the topology? Is there yet another format that is better supported? Do hydrogen atoms follow a different convention in molly.jl?

jgreener64 · 2023-10-31T14:18:15Z

The residue template in the force field xml file defines the atom names (which currently must be matched exactly) and the connectivity between them. For example for alanine in ff99SBildn.xml:

    <Residue name="ALA">
      <Atom name="N" type="N" charge="-0.4157"/>
      <Atom name="H" type="H" charge="0.2719"/>
      <Atom name="CA" type="CT" charge="0.0337"/>
      <Atom name="HA" type="H1" charge="0.0823"/>
      <Atom name="CB" type="CT" charge="-0.1825"/>
      <Atom name="HB1" type="HC" charge="0.0603"/>
      <Atom name="HB2" type="HC" charge="0.0603"/>
      <Atom name="HB3" type="HC" charge="0.0603"/>
      <Atom name="C" type="C" charge="0.5973"/>
      <Atom name="O" type="O" charge="-0.5679"/>
      <Bond atomName1="N" atomName2="H"/>
      <Bond atomName1="N" atomName2="CA"/>
      <Bond atomName1="CA" atomName2="HA"/>
      <Bond atomName1="CA" atomName2="CB"/>
      <Bond atomName1="CA" atomName2="C"/>
      <Bond atomName1="CB" atomName2="HB1"/>
      <Bond atomName1="CB" atomName2="HB2"/>
      <Bond atomName1="CB" atomName2="HB3"/>
      <Bond atomName1="C" atomName2="O"/>
      <ExternalBond atomName="N"/>
      <ExternalBond atomName="C"/>
    </Residue>

The PDB file would have to have ALA as the residue name and every atom name appearing exactly once for a residue to match this template. Future work will allow flexibility in residue names and atom names by searching for the best template.

At the minute you are probably best off removing all hydrogens and adding them back with OpenMM, which will give atom names consistent with OpenMM XML force field files. All residues will need a template available, though that is the case with most software.

maartensandbox · 2023-11-06T16:19:35Z

I think I correctly implemented alanine for use with ff99

ATOM      1  H1  ACE     1       2.000   1.000  -0.000
ATOM      2  CH3 ACE     1       2.000   2.090   0.000
ATOM      3  H2  ACE     1       1.486   2.454   0.890
ATOM      4  H3  ACE     1       1.486   2.454  -0.890
ATOM      5  C   ACE     1       3.427   2.641  -0.000
ATOM      6  O   ACE     1       4.391   1.877  -0.000
ATOM      7  N   ALA     2       3.555   3.970  -0.000
ATOM      8  H   ALA     2       2.733   4.556  -0.000
ATOM      9  CA  ALA     2       4.853   4.614  -0.000
ATOM     10  HA  ALA     2       5.408   4.316   0.890
ATOM     11  CB  ALA     2       5.661   4.221  -1.232
ATOM     12 HB1  ALA     2       5.123   4.521  -2.131
ATOM     13 HB2  ALA     2       6.630   4.719  -1.206
ATOM     14 HB3  ALA     2       5.809   3.141  -1.241
ATOM     15  C   ALA     2       4.713   6.129   0.000
ATOM     16  O   ALA     2       3.601   6.653   0.000
ATOM     17  N   NME     3       5.846   6.835   0.000
ATOM     18  H   NME     3       6.737   6.359  -0.000
ATOM     19  CH3 NME     3       5.846   8.284   0.000
ATOM     20 HH31 NME     3       4.819   8.648   0.000
ATOM     21 HH32 NME     3       6.360   8.648   0.890
ATOM     22 HH33 NME     3       6.360   8.648  -0.890
TER
END

However, when I create the system and look at the interactions:
sys = System("alanine.pdb",ff,boundary = CubicBoundary(4.0u"nm"), loggers=(coords=CoordinateLogger(50),),rename_terminal_res = false)

then I only find sys.specific_inter_lists to contain interactions between atoms in the ALA group. Shouldn't there also be a harmonicbond force between for example the C and O atoms in the ACE group?

jgreener64 · 2023-11-06T19:16:26Z

Yes there should, I think this a problem with Chemfiles only reading bonds for non-standard residues when there is a CONECT record in the PDB file. I can add a mention to the docs, but in general non-standard residues (including ACE/NME terminal caps) aren't tested yet.

See https://github.com/noeblassel/SINEQSummerSchool2023/blob/main/notebooks/dipeptide_nowater.pdb for an alanine dipeptide that reads in okay:

using Molly
ff_dir = joinpath(dirname(pathof(Molly)), "..", "data", "force_fields")
ff = MolecularForceField(joinpath.(ff_dir, ["ff99SBildn.xml", "tip3p_standard.xml"])...)
sys = System("dipeptide_nowater.pdb", ff; rename_terminal_res=false)

maartensandbox · 2023-12-05T15:23:44Z

One more issue:

using Molly
ff = MolecularForceField("ff14SB.xml");

sys = System("big_alanine.pdb",ff,boundary = CubicBoundary(1000.0u"nm"),rename_terminal_res = false)
total_energy(sys,find_neighbors(sys))

prints 1128.45113918646 kJ mol⁻¹

the python equivalent

import openmm
from openmm import app, unit
forcefield = app.ForceField('amber/ff14SB.xml')
f = app.PDBFile('big_alanine.pdb')
system = forcefield.createSystem(f.getTopology(), nonbondedMethod=app.NoCutoff)
integrator = openmm.LangevinMiddleIntegrator(310 * unit.kelvin,1.0 / unit.picosecond,1 * unit.femtosecond)
sim = app.Simulation(f.getTopology(), system, integrator)
sim.context.setPositions(f.getPositions())
state = sim.context.getState(getEnergy=True)
state.getPotentialEnergy()

prints Quantity(value=547.6336059570312, unit=kilojoule/mole)

Other quantities are also incorrect, so presumably my file is still being read in incorrectly? Or was a different convention used between openmm and molly?

big_alanine.pdb.txt

jgreener64 · 2023-12-05T17:32:57Z

It seems okay, the equivalent OpenMM call would be:

import openmm
from openmm import app, unit

forcefield = app.ForceField('amber/ff14SB.xml')
f = app.PDBFile('big_alanine.pdb')

system = forcefield.createSystem(
    f.getTopology(),
    nonbondedMethod=app.CutoffPeriodic,
    nonbondedCutoff=1*unit.nanometer,
    constraints=None,
    rigidWater=False,
    switchDistance=None,
    useDispersionCorrection=False,
)

integrator = openmm.LangevinMiddleIntegrator(310 * unit.kelvin,1.0 / unit.picosecond,1 * unit.femtosecond)
sim = app.Simulation(f.getTopology(), system, integrator)
sim.context.setPositions(f.getPositions())
state = sim.context.getState(getEnergy=True)
state.getPotentialEnergy()

Quantity(value=1128.4508056640625, unit=kilojoule/mole)

Which is much better, there is still a 3E-4 kJ/mol discrepancy.

Note you need a line like

CRYST1  100.000  100.000  100.000  90.00  90.00  90.00 P 1           1

in the PDB file for CutoffPeriodic to work.

maartensandbox · 2023-12-05T23:39:28Z

that clarifies some stuff! I'm really struggling with basic things but from testing it looks like molly is at least a factor 10 faster than openmm (for my admittedly weird use case).

jgreener64 · 2023-12-05T23:46:16Z

Cool, I'd be interested to hear roughly what you are doing if you are able to share, we might be able to add more features and docs to help.

Btw make sure you are on Molly.jl v0.18.2 or later as that version had a significant performance improvement for periodic boundary conditions.

maartensandbox · 2023-12-20T23:10:21Z

The plan is to wrap it up in a paper (though the draft is progressing slowly), so molly will definitely be cited! We really only need a way to evaluate energies and manipulate positions. Molly is nice because unlike pythonic packages that bind to optimized C code, I can directly profile and manipulate the system structure.

At the moment I'm only interested in all-to-all, but there are two other projects I want to try molly on, one which requires mixing periodic boundary conditions in one direction with open boundary conditions in another, and another where I'll be using conventional periodic boundary conditions. Thanks a lot for the hard work that went into the package! It was bit rough to get started simulating my first molecule, but after that it's really only been smooth sailing!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

open an sdf file #154

open an sdf file #154

maartensandbox commented Oct 23, 2023

jgreener64 commented Oct 24, 2023

maartensandbox commented Oct 31, 2023

jgreener64 commented Oct 31, 2023

maartensandbox commented Nov 6, 2023

jgreener64 commented Nov 6, 2023

maartensandbox commented Dec 5, 2023

jgreener64 commented Dec 5, 2023

maartensandbox commented Dec 5, 2023

jgreener64 commented Dec 5, 2023

maartensandbox commented Dec 20, 2023

open an sdf file #154

open an sdf file #154

Comments

maartensandbox commented Oct 23, 2023

jgreener64 commented Oct 24, 2023

maartensandbox commented Oct 31, 2023

jgreener64 commented Oct 31, 2023

maartensandbox commented Nov 6, 2023

jgreener64 commented Nov 6, 2023

maartensandbox commented Dec 5, 2023

jgreener64 commented Dec 5, 2023

maartensandbox commented Dec 5, 2023

jgreener64 commented Dec 5, 2023

maartensandbox commented Dec 20, 2023