Red M D Stream

Reduced Molecular Dynamics Streamlined
version beta1
Users’ Manual
Filip Leonarski
Joanna Trylska
1
The RedMDStream program is free software; you can redistribute it and/or modify it under the
terms of the GNU General Public License version 2, as published by the Free Software Foundation.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY;
without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not,
write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307,
USA.
Copyright (C) 2007 - 2014: University of Warsaw
CONTENTS
2
Contents
1 Introduction to the RedMDStream Package
3
2 Installation and execution
2.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
4
3 Units
6
4 Input files
4.1 XML format parsing . . . . . . . . . . . .
4.2 Equation parsing . . . . . . . . . . . . . .
4.3 Simulation protocol . . . . . . . . . . . . .
4.3.1 Simulation execution . . . . . . . .
4.3.2 Trajectory analysis . . . . . . . . .
4.4 Topology definition . . . . . . . . . . . . .
4.4.1 Atom mapping rules . . . . . . . .
4.4.2 Bond creation rules . . . . . . . . .
4.4.3 Potential energy function definition
4.4.4 Nonbonded potential . . . . . . . .
4.5 Optimization protocol . . . . . . . . . . .
4.5.1 Evolutionary algorithm example . .
4.5.2 Grid spacing example . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
7
7
8
8
10
12
13
13
15
15
17
17
17
18
1 INTRODUCTION TO THE REDMDSTREAM PACKAGE
3
Disclaimer — beta version status
Thank you for downloading beta version of the RedMDStream. The software is still a project in
development and full version is expected by the end of June.
This manual refers to a beta version of RedMDStream. It is made available to demonstrate
the software features and options. We have made an effort to make sure that functionality and
options described in the following text are error–free, however since testing process is not finished,
the code might contain bugs. Also for this reason we mark some options in the manual as
EXPERIMENTAL, if we feel that not adequate amounts of tests were done for the option in
question.
At the moment presented manual is designed to provide a technical support for the user. For a
full version we will place more emphasis on providing a reference information through description
of methods and algorithms included in the RedMDStream.
All comments and suggestions are welcome to the authors: [email protected] or
[email protected].
1
Introduction to the RedMDStream Package
RedMDStream is a free open source package for developing coarse–grained (CG) models for
biomolecules (nucleic acids, proteins and their complexes) by automatizing and streamlining the most time consuming steps of the procedure. Since creating a CG model is a very time
consuming task, we have created this software to help the researcher efficiently explore plethora
of possibilities available in an art of designing such models (representation, bonding patterns,
potential energy function). This software focus is on point–beads models, not bound to lattice,
with a small number of beads (1–4) representing a residue.
RedMDStream provides routines to perform the following tasks:
• Mapping a full–atomistic structure to a coarse-grained representation based on user provided
topology.
• Setting a network of bonded interactions (pseudo–bond, pseudo–angle, pseudo–dihedral)
between beads based on their position in sequence, spatial coordinates, additional structural
information (2D/3D) and other properties.
• Introducing a potential energy functional form for the bonded interactions defined above
and the nonbonded interactions between all the others.
• Running molecular dynamics using a built–in RedMD simulation engine.
• Analyzing root mean square deviation, root mean square fluctuations and distance distributions for the resulting trajectory.
• Optimizing parameters of the model using metaheuristic or local search algorithms.
RedMDStream, as name suggests, is an extension to RedMD coarse–grained molecular dynamics software developed in Dr Joanna Trylska group in years 2008-2014. The RedMD simulation
engine is fully integrated into the RedMDStream software. Please note however tht additional
2 INSTALLATION AND EXECUTION
4
tools bundled in the RedMD package (e.g., structure preparation, analysis) are not included in
the RedMDStream package to avoid redundancy. If needed please install them from the RedMD
package available at the laboratory webpage.
RedMDStream is intended to be extensible. The source code is structured, therefore adding
new methods and routines is straightforward.
The RedMDStream source code is written entirely in C and C++ languages and parallelized
with OpenMP and MPI technologies. The code was tested on various LINUX platforms.
Please direct your comments and questions about the RedMDStream package to:
[email protected] or [email protected]
Centre of New Technologies, University of Warsaw
˙
Zwirki
i Wigury 93
02-089 Warsaw, Poland
phone: +48 22 5540 843
The source code of the RedMDStream package can be found at:
http://bionano.cent.uw.edu.pl/software
The RedMDStream package is distributed under the terms of the GNU Public License. A copy of
the GPL is provided with the RedMD distribution and is also available at http://www.gpl.org.
2
Installation and execution
2.1
Requirements
RedMDStream is a C++ Linux/UNIX application, installation on Windows system should be possible using Cygwin. RedMDStream is available for download as a source code, to allow compilation
for particular user system. It therefore requires:
• GNU C or C++ compiler1 ,
• certain basic library function, which are usually supplied by Linux/UNIX system, however
lack of getline function on older Mac OS X systems might prevent program from compiling,
• libxml2 library with headers for compilation (libxml2-devel or libxml2-dev package has
to be installed on certain Linux distributions).
After downloading the package from our web page one should unpack the archive:
tar xzf RedMDStream-beta-src.tar.gz
1
It might be beneficial to try machine-specific compilers to obtain better performance on clusters like IBM
BlueGene or Power systems. This is however not tested at the moment.
2 INSTALLATION AND EXECUTION
5
Rnter into the RedMDStream sources directory:
cd RedMDStream
Execute a configuration script and compile the program:
./configure
make
make install
By default RedMDStream will be compiled with an OpenMP parallelization enabled (if available). To compile without OpenMP, one should use the following commands:
./configure -disable-openmp
make
make install
Compiled executable will be stored in the /usr/local/bin folder. To use a different location add -prefix= option with a directory name to the configure script (e.g., ./configure
-prefix=/opt/RedMD). To run RedMDStream simply write:
RedMDStream <input file name>
If necessary add a directory prefix, depending on usage of prefix option in configure script, e.g.:
/opt/RedMD/bin/RedMDStream <input file name>
3 UNITS
3
6
Units
RedMDStream uses the following system of units:
internal
˚
Length
A = 10−10 m
Mass
u = 1.6605402(10) · 10−27 kg
Charge
e = 1.602 · 10−19 C
Time
ps = 10−12 s
Temperature
K
J
Energy
10 mol
input and output files
input parameters
˚
A
u
e
ps
K
kcal
mol
For angle measure, degrees are taken to measure pure angles and radians in force constants
and other composite parameters. This behavior can be changed by using RedMDStream options.
4 INPUT FILES
4
7
Input files
RedMDStream can be executed in two different modes:
• Simulation mode In this mode a single simulation, with a defined coarse–grained model,
is prepared, executed and analyzed. To use this mode provide a simulation protocol file as
a RedMDStream input (see Section 4.3).
• Optimization mode In this mode multiple simulations are executed in a batch mode to
find an optimal model parameter set. To use this mode provide an optimization protocol
file as a RedMDStream input (see Section 4.5).
4.1
XML format parsing
RedMDStream protocol and topology files are defined in the XML standard. In a XML file data
are organized into tags. A tag can incorporate other tags or text. In such case it is written in the
following way:
<TAG>
text
<INNER TAG>
</INNER TAG>
</TAG>
with always an opening and closing tag. Missing a closing tag or trying to interlace tags (closing
a tag before first closing all enclosed tags) will result in a syntax error and stop of a program
execution. If a tag does not include any content it can be written in a shorter version:
<TAG />
In this way closing tag is unnecessary. Tags may also include properties:
<TAG p r o p e r t y 1=”v a l u e 1 ” p r o p e r t y 2=”v a l u e 2 ” />
One tag may include only one property with a certain name, but can enclose multiple tags with the
same name. Including some properties is obligatory, for example if one writes a tag to open a file,
this would be a nonsense operation without providing a filename. If a tag without an obligatory
property is found a program execution will stop. Other properties can be optional. If they are
omitted, there won’t be an error message, however a default value will be assigned to the property.
RedMDStream will produce an error, if a tag is found that should not be enclosed in other tags,
however properties that unneeded are ignored. This might lead to errors, when a typographic
error is made while writing an optional property name, as default value will be assigned.
Everything inside comment tags is being ignored:
<!−− i g n o r e d t e x t −−>
4 INPUT FILES
4.2
8
Equation parsing
RedMDStream has included an equation parser. One can use standard operations in the equation
editor (+ − ∗ / ˆ ) and mathematical functions (log log10 exp sin cos tan sinh cosh tanh). One
can also use comparison operators (since in XML standard lower/greater signs are used for tags,
such operators have to be described using acronyms) :
• eq or = Equals
• neq or != Not equals
• lt Less than
• gt Greater than
• lte Less than or equals
• gte Greater than or equals
Logical expressions and mathematical expressions can be mixed using ? and : operators:
expr1 ? expr2 : expr3
expr1 can be a logical expression or mathematical expression. If it is true (logical) or not zero
(math), expr2 is evaluated. Otherwise expr3 is evaluated.
Number all always parsed as real numbers (floating point), there is no integer mathematics
available. One can also use variables in the RedMDStream equations.
4.3
Simulation protocol
Simulation protocol tag has to include a root tag PROTOCOL. Directives inside the PROTOCOL
tag are executed in sequential order. If any of the directives stop program execution is stopped.
The directives include:
• Load another protocol file:
<PROTOCOL f i l e =” f i l e n a m e ” />
• Define variable for equation parsing:
<VAR name=”PI ” >3.1415926 </VAR>
<VAR name=”E”>exp (1) </VAR>
Expression for the variable is placed inside the tags.
• Print variable value:
<PRINT name=”var1 ” />
• Create a distance distribution:
4 INPUT FILES
9
<DISTRIBUTION name=” d i s t r 1 ” beg =”0.0” end =”100.0” b i n s =”100”>
<EXPR> exp(−4∗ r ) </EXPR>
</DISTRIBUTION>
Generates a distribution for distances from 0.0 ˚
A to 100 ˚
A in 100 bins of an f (r) = exp(−4r)
function and saves it in memory under name distr1). One can also load a distribution from
a text file (in every line there is bin position and number of counts separated by space):
<DISTRIBUTION>
<TXTFILE f i l e =” f i l e n a m e ” />
</DISTRIBUTION>
or load a distribution from a binary file generated by RedMDStream:
<DISTRIBUTION>
<BINFILE f i l e =” f i l e n a m e ” />
</DISTRIBUTION>
• Define a coarse–grained mapping and topology in the same file:
<TOPOLOGY>
...
</TOPOLOGY>
or loaded from a different file:
<TOPOLOGY f i l e =”top . xml” />
See Section 4.4 for details on how to define topology.
• Load a molecule structure file:
<MOLECULE name=”mol1 ” pdb=” f i l e 1 . pdb” />
<MOLECULE name=”mol2 ” pdb=” f i l e 2 . pdb”
rnaview=” f i l e 2 . pdb . out ” />
<MOLECULE name=”mol3 ” pdb=” f i l e 3 . pdb”
d o t b r a c k e t=” f i l e 3 . 2 d” />
Loads a PDB file structure from file, optionally include 2D RNA structure in dot-bracket
format or RNAView tertiary RNA structure and applies a coarse–grained topology afterwards.
• Load an ensemble of structures in the PDB format to generate a distance distribution:
<DIRECTORY name=” d i r ” pdb=”.pdb” s c a n S u b d i r s=”t r u e ”>
<DISTRIBUTION name=” d i s t r 2 ” beg =”0.0” end =”100.0” b i n s =”100”
bondRule=” r u l e 1 ”>
<TXTFILE s a v e=” f i l e . t x t ” />
<BINFILE s a v e=” f i l e . d i s t r ” />
</DISTRIBUTION>
</DIRECTORY>
4 INPUT FILES
10
Analyzes all PDB files (extension .pdb) in a directory named dir and all its sub directories.
Creates a beads distance distribution named distr2, based on bond named rule1 (see
Sec. 4.4). If bondRule is omitted, distribution will be calculated for distances between all
beads. Distribution is calculated for distances from 0.0 ˚
A to 100.0 ˚
A, in 100 bins and saved
in memory as distr2, in a text file called file.txt and in a binary file called file.distr.
• Save a coarse–grained molecule structure:
<OUTPUT mol=”mol1 ” pdb=”mol1 . cg . pdb” />
<OUTPUT mol=”mol1 ” sxml=”mol1 . sxml ” />
Saves mol1 coarse–grained structure to mol1.cg.pdb in the PDB format and coarse–grained
structure and topology to a file mol1.sxml in the RedMD Structure XML file (contains
information about beads and interactions).
• Run an MD simulation:
<SIMULATION mol=”mol1 ” . . . >
...
<ANALYZE>
...
</ANALYZE>
</SIMULATION>
Simulate molecule named mol1. See Section 4.3.1 for detailed information on SIMULATION
tag. Simulation trajectory analysis commands are enclosed in ANALYSIS tag and described
in Section 4.3.2.
• Load a saved DCD trajectory:
<TRAJECTORY mol=”mol1 ” dcd=”mol1 . t r a j . dcd ” cg=”t r u e ”>
<ANALYZE>
...
</ANALYZE>
</TRAJECTORY>
See Section 4.3.2 for details on analysis of a simulation trajectory enclosed in ANALYSIS
tag.
4.3.1
Simulation execution
For details on how simulation is performed (equations of motion, etc.) please refer to Ref. [1]
or RedMD manual (available on the http://bionano.cent.uw.edu.pl/Software/RedMD webpage).
Simulation execution tag depends on simulation mode. For integration (NVE simulation) one
should start the SIMULATION section in the following way:
<SIMULATION mol=”mol1 ” mode=” i n t e g r a t i o n ” dt =”10 f s ”>
with all properties obligatory. A property named dt denotes simulation timestep. One can use
time units (fs or ps) or if left unitless picoseconds will be used. One can also define how simulation
4 INPUT FILES
11
will be saved by using optional properties of SIMULATION tag. By default simulation is stored
and analyzed in memory. To save trajectory to file one has to use dcd property, for example:
<SIMULATION mol=”mol1 ” mode=” i n t e g r a t i o n ” dt =”10 f s ” dcd=”mol1 . dcd”>
Other simulation modes are NVT simulation with Berendsen thermostat:
<SIMULATION mol=”mol1 ” mode=”b e r e n d s e n ” dt =”10 f s ” tau=”1”>
where tau denotes coupling constant in ps.
There is also Langevin dynamics available:
<SIMULATION mol=”mol1 ” mode=” l a n g e v i n ” dt =”10 f s ” gamma=”1”>
with gamma being damping constant in ps−1 .
Simulation mode is followed by simulation commands:
• Run minimization:
<MINIMIZE />
Structure is minimized using a limited memory Broyden-Fletcher- Goldfarb-Shanno (LBFDS) method.
• Set temperature for initialization (if no dynamics were performed), Berendsen bath coupling
or Langevin dynamics:
<TEMPERATURE v a l =”310” />
Temperature pattern can be also defined in terms of actual step. For heating during 10000
simulation steps from 10 K to 300 K and later having constant temperature simulation one
can use the following command:
<TEMPERATURE>
( s t e p < 10000) &l t ; 10 + s t e p / 10000 ∗ (300 − 1 0) : 300
</TEMPERATURE>
If expression is used to define temperature, it has to be enclosed between opening and closing
TEMPERATURE tags.
• Perform actual molecular dynamics simulation:
<STEP s t e p s =”10000” />
This perform simulation for 10000 steps. One can also define a number of steps in time
units:
<STEP s t e p s =”10 ns ” />
which can be more convenient for a longer simulation setup (fs/ps/us2 /ms are available).
There are also options to control size of saved trajectory. This can be defined either by
setting interval in number of steps, when frame save is performed:
2
for microsecond
4 INPUT FILES
12
<STEP s t e p s =”10 ns ” saveFreq =”1000” />
frame will be saved every 1000 frames or by setting a target amount of frames to be saved
in total:
<STEP s t e p s =”10 ns ” saveFrames =”1000” />
in total 1000 frames will be saved from the simulation. The last option is not to save at all:
<STEP s t e p s =”10 ns ” noSave=”t r u e ” />
• Analyze trajectory — see Section 4.3.2.
4.3.2
Trajectory analysis
RedMDStream provides three ways to analyze trajectories (see Ref. [2] on how the measures are
defined in details):
• Structural fidelity by means of root mean square deviation (RMSD)
RedMDStream calculates RMSD using Kabsch algorithm [3] to account for the best translation and rotation (that minimizes RMSD) of a molecule prior to calculating the value.
Calculated average RMSD for the whole trajectory can be saved to a variable:
<RMSD var=”rmsd” />
or RMSD for each frame printed to a file:
<RMSD f i l e =”mol1 . rmsd” />
• Structure mobility by means of root mean square fluctuations (RMSF)
To calculate average RMSF one can use the following tag:
<RMSF var=”rmsf ” />
To save RMSF to file for each bead:
<RMSF f i l e =”mol1 . rmsf ” />
RedMDStream performs alignment of trajectory before calculating RMSF to not include
translations and rotations of the molecule during simulation in the RMSF count. This can
be turned off by adding align="false" to XML tag:
<RMSF var=”rmsf ” a l i g n =” f a l s e ” />
RedMDStream can also calculate RMSF in B-factor units:
<RMSF var=”rmsf ” b F a c t o r U n i t s=”t r u e ” />
• Distance distribution functions
Beads distance distribution functions can be calculated based on trajectory using DISTRIBUTION tag:
4 INPUT FILES
13
<DISTRIBUTION beg =”0.0” end =”100.0” b i n s =”100”
bondRule=” r u l e 1 ” name=” d i s t r 3 ” />
Creates a beads distance distribution named distr3, based on bond named rule1 (see
Sec. 4.4). If bondRule is omitted, distribution will be calculated for distances between all
beads. Distribution is calculated for distances from 0.0 ˚
A to 100.0 ˚
A, in 100 bins and saved in
memory as distr3. One can use also a previously saved distribution as a reference to allow
for comparison. In such case distance range and bins number are taken from a reference
distribution and should not be provided:
<DISTRIBUTION bondRule=” r u l e 1 ” name=” d i s t r 4 ” r e f =” d i s t r 1 ” />
If reference is provided the following analysis and comparisons can be made:
<DISTRIBUTION bondRule=” r u l e 1 ” name=” d i s t r 4 ” r e f =” d i s t r 1 ”>
<DIVIDE> r ˆ2 </DIVIDE>
<K−S var=”comp1” />
<CHISQUARE var=”comp2” />
<HODKIN var=”comp3” />
<RMSD var=”comp4” />
<MAX POSITION var=”max” />
<TXTFILE name=”s a v e . t x t ” />
</DISTRIBUTION>
First distribution might be divided by a function (in this case r2 ). Then following comparisons with distr1 distribution are made: Kolomogorov-Smirnov measure is stored in comp1
variable, χ-square in comp2, Hodkin index in comp3 and RMSD–like measure in comp4.
Position of a bin with highest count is stored in max. Distribution is saved in a text file
save.txt.
4.4
Topology definition
A coarse–graining process in RedMDStream is performed in the following way:
• Atoms from high–resolution structure are mapped into a coarse–grained representation based
on ”atom rules”.
• Bond network is created based on ”bond rules”.
• Potential energy function terms are associated with particular bonds.
All the tasks mentioned above are referred in the manual as a ”topology”. Rules how to perform
these tasks are encoded in a topology file or in a TOPOLOGY tag. If a separate file is used it has
to contain a single TOPOLOGY root tag.
4.4.1
Atom mapping rules
RedMDStream provides two ways to create a coarse–grain bead:
4 INPUT FILES
14
• Use a currently existing atom.
Coarse–grain bead is associated with a particular atom in the full–atomistic representation.
For example phosphorus in nucleic acids or α-carbon in amino acids. In such case definition
goes as following:
<ATOMTYPE name=”CGP”>
<VAR name=”mass”>300</VARIABLE>
<VAR name=”c h a r g e ”>−1.0</VARIABLE>
<ATOMRULE>
<FIND ATOM atomName=”P” />
</ATOMRULE>
</ATOMTYPE>
This will create an atom of type CGP with mass of 300 u and charge -1.0 e, placed in position
of atom named ”P” in the PDB file. One can extend ATOMRULE section to incorporate
more conditions, for example:
<ATOMTYPE name=”CGP”>
<VAR name=”mass”>300</VARIABLE>
<VAR name=”c h a r g e ”>−1.0</VARIABLE>
<ATOMRULE>
<FIND ATOM atomName=”CA” resName=”LYS” />
<EXPR>atom . i s H e l i x </EXPR>
</ATOMRULE>
</ATOMTYPE>
Find all Cα atoms in lysine residue, which are part of a helix (based on data in the PDB
file).
• Create a bead based on average position of multiple atoms.
In this case a bead is created in a new position, for example in center of mass of a residue.
This rules are calculated for each residue, so at the moment it is not possible to build a
bead corresponding to multiple residues. While creating a bead, variables that correspond
to that bead can be aggregated from member beads. For example:
<ATOMTYPE name=”CGP”>
<NEWATOM name=”LYS” centreOfMass=”t r u e ”>
<RESIDUE name=”LYS” />
<VAR name=”mass ” ag gr=”sum” />
<VAR name=”tempFactor ” ag gr=”avg ” />
</NEW ATOM>
</ATOMTYPE>
This creates a bead for lysine amino acid, in the residue center of mass (centreOfMass="false"
would create a bead in a geometrical center of residue). This bead will have two variables:
mass being sum of atom masses in the residue and tempFactor being average of atoms
temperature factors.
4 INPUT FILES
4.4.2
15
Bond creation rules
Second point is to define rules, where bonds are placed in the structure. These rules will be used
when defining potential energy terms. Each rule has to have a name. An example rule:
<BONDRULE name=”Pneigh”>
<NEIGHBOR />
<ATOM type=”P” />
</BONDRULE>
This rule includes P (phosporus) beads that are neighboring (1–2 for pseudo–bond, 1–3 for pseudo–
angle, 1–4 for pseudo–dihedral).
<BONDRULE name=”P c l o s e ”>
<DIST min =”0.0” max=”10.0” />
<ATOM type1=”CA” type2=”P” />
</BONDRULE>
This rule includes P (phosporus) beads interacting with CA (α-carbon) beads that are in vicinity
(distance 0-10 ˚
A).
<BONDRULE name=”Pcompl1”>
<INTERACTION name=”WCHBond” />
<ATOM type=”P” />
</BONDRULE>
This rule includes P (phosporus) beads from bases connected using the Watson-Crick complementary pairs interaction read from 2D/3D RNA structure file.
<BONDRULE name=”Pcompl2”>
<SHIFT s h i f t 1 =”1” s h i f t 2 =”1”>
<INTERACTION name=”WCHBond” />
</SHIFT>
<ATOM type=”P” />
</BONDRULE>
This rule is similar to the previous one, however connects beads from bases that are next in
sequence to beads interacting by the Watson-Crick hydrogen bonding pattern (so called i+1:j+1
interaction).
4.4.3
Potential energy function definition
When bond rules are defined, the next step is to assign these rules to a potential energy function
term. It is done as in the following example:
<BOND HARM name=”compl2”>
<BONDRULE name=”compl2 ” />
<VAR name=”r 0”>bond . d i s t </VARIABLE>
<VAR name=”k”>5.0</VARIABLE>
</BOND HARM>
4 INPUT FILES
16
In this example a harmonic potential pseudo–bond is created between beads described by compl2
bond rule (potential and bond rule names can overlap). Potential parameters are defined with an
equilibrium distance equal to distance in a reference structure and a force constant equal to 5 kcal˚2 .
molA
In the parameter definition one can use values connected to the bond (bond.dist, bond.angle,
bond.dihedral), atoms forming the bond (atom1.charge, atom2.mass) or variables defined in
the protocol/optimization file (should be prefixed by global, e.g. global.PI). Potential terms are
assigned to bonds in order they appear in the file. If two potentials would describe interaction
between the same beads only the first in order will be used and second ignored.
RedMDStream supports the following potentials:
• Harmonic pseudo–bond potential
1
V (r) = k(r − r0 )2
2
(1)
tag: BOND HARM, parameters to declare: r0= r0 and k= k
• Harmonic pseudo–angle potential
1
V (θ) = kθ (θ − θ0 )2
2
(2)
tag: ANG HARM, parameters to declare: theta0= θ0 and k= kθ
• Harmonic pseudo–dihedral potential
2
V (α) = kα (α − α0 )2
2
(3)
tag: DIH HARM, parameters to declare: alpha0= α0 and k= kα
• Morse pseudo–bond potential
V (r) = E0 ∗ (1 − exp(−α ∗ (r − r0 )))2
(4)
tag: BOND MORSE, parameters to declare: E0= E0 , r0= r0 and alpha= α
• Lennard-Jones pseudo–bond potential
V (r) = 4 ∗ ε ∗ ((σ/r)12 − (σ/r)6 )
(5)
tag: BOND LJ, parameters to declare: eps= ε and sigma= σ
• Custom pseudo–bond potential This is a user–defined potential. It is defined as:
<BOND CUST name=”custom1 ” f o r m u l a =”2∗ g l o b a l . PI / l o g ( r )∗ exp(−bond . d i s t ∗ r ∗k)”>
<BONDRULE name=”compl2 ” />
<VAR name=”k”>5.0</VARIABLE>
</BOND HARM>
4 INPUT FILES
4.4.4
17
Nonbonded potential
Nobonded potential are defined inside a NONBONDED section tags. They can be defined for
a particular pair of beads (by their type) or for all beads (”default” potential). An example
NONBONDED section looks like this:
<NONBONDED>
<BOND LJ name=”nonbLJ”>
<VAR name=”eps ”>3.0</VAR>
<VAR name=”sigma ”>15.0</VAR>
</BOND LJ>
<BOND MORSE name=”morseNB” type1=”P” type2=”CA”>
<VAR name=”E0”>3.0</VAR>
<VAR name=”r 0 ”>15.0</VAR>
<VAR name=”alpha ”>0.1</VAR>
</BOND MORSE>
</NONBONDED>
4.5
Optimization protocol
RedMDStream can be launched in optimization mode. In this mode a coarse–grained model can
be optimized for a particular needs. See Ref. [2] for details. Optimization and simulation protocol
communicate through variables. Used defines certain variables as parameters, theirs values is
assigned by an optimization routine and other variables, produced by simulation, are considered
a score (for example RMSD, RMSF, distance distribution comparison value). This corresponds
to a definition of a function:
M
X
f (x1 , ..., xn ) =
w i si
(6)
i
where xi are parameters assigned by the optimization routine, wi are weights for each score and si
are values of certain scoring functions. In the RedMDStream optimization procedure is a general
one, it does use any knowledge of underlying physics of the system. In this way it can avoid traps
of analytic methods, like for example parameter correlation in case of Boltzmann Inversion. The
following methods are available:
• Evolutionary algorithm
• Particle swarm optimization
• Grid calculation
For description of available methods at the moment please refer to Refs. [2] and [4]. Example files
are available with RedMDStream.
4.5.1
Evolutionary algorithm example
Example optimization file for evolutionary algorithm execution:
4 INPUT FILES
18
<OPTIMIZATION method=”evo”>
<PROTOCOL r e r u n =”10” name=”run . xml” />
<DATA>
<PARAMETER name=”p1” min =”0.0” max=”1.0” />
<PARAMETER name=”p2” min =”5.0” max=”7.0” />
<SCORE name=”s1 ” v a l u e=”rmsd” max=”50” weight =”0.7” />
<SCORE name=”s2 ” v a l u e=”rmsf ” max=”100” weight =”0.3” />
</DATA>
<OPTIONS>
<ITERATION s i z e =”16” number=”128”/>
<ELITE v a l =”16” />
<MATING type=”tournament ” />
<MUTATION type=”g a u s s ” t o l e r a n c e =”0.1” />
<CROSSOVER b l e n d i n g =”1.0” />
<FILE s a v e=”o p t i m i z e . r e s u l t ”
t x t=”o p t i m i z e . r e s u l t . c s v ” />
</OPTIONS>
</OPTIMIZATION>
In this file scoring function will take two arguments (p1 from 0–1 and p2 from 5–7). Value of
the scoring function will be calculated be running run.xml protocol file (average of 10 executions), taking rmsd variable with 70% weight and rmsf variable with 30% weight. Evolutionary
algorithm will run for 16 generations, each of 128 members, 16 best members will copied unchanged to a next generation (elitism). Mating will be done using tournament selection, mutation
are applied with normal probability from the parameter value, crossing–over will be done using
blending method. Results will be saved to optimize.result (binary format for restarts) and to
optimize.result.csv (Excel readable comma separated values file).
4.5.2
Grid spacing example
<OPTIMIZATION method=” g r i d ”>
<PROTOCOL r e r u n =”10” name=”run . xml” />
<DATA>
<PARAMETER name=”p1” min =”0.0” max=”1.0” />
<PARAMETER name=”p2” min =”5.0” max=”7.0” />
<SCORE name=”s1 ” v a l u e=”rmsd” max=”50” weight =”0.7” />
<SCORE name=”s2 ” v a l u e=”rmsf ” max=”100” weight =”0.3” />
</DATA>
<OPTIONS>
<STEPS v a l =”8” />
<FILE t x t=”o p t i m i z e . r e s u l t . c s v ” />
</OPTIONS>
</OPTIMIZATION>
In this file scoring function will take two arguments (p1 from 0–1 and p2 from 5–7). Value of the
scoring function will be calculated be running run.xml protocol file (average of 10 executions),
4 INPUT FILES
19
taking rmsd variable with 70% weight and rmsf variable with 30% weight. Calculation will be
performed for values of p1 for 0,1,2,3,...,10 and values of p2 also for 0,1,2,3,...,10, resulting in 100
calculations of the scoring function, providing some insight into scoring function behavior for the
parameters of interest.
REFERENCES
20
References
[1] Adam G´orecki, Marcin Szypowski, Maciej Dlugosz, and Joanna Trylska. Redmd – reduced
molecular dynamics package. J. Comput. Chem., 30:2364–2373, 2009.
[2] F. Leonarski, F. Trovato, V. Tozzini, A Le´s, and J. Trylska. Evolutionary algorithm in the
optimization of a coarse–grained force field. J. Chem. Theory Comput., 9:4874–4889, 2013.
[3] Wolfgang Kabsch. A solution for the best rotation to relate two sets of vectors. Acta Crystallographica Section A, 32(5):922–923, Sep 1976.
[4] Randy L. Haupt and Sue E. Haupt. Practical Genetic Algorithms. Wiley-Interscience, 2
edition, May 2004.