---------------------------------------------------------------------------

fix_topology, version 1.6 - a program to fix GROMACS topologies

Authors: Antonio M. Baptista, João M. Damas -- ITQB NOVA

---------------------------------------------------------------------------

Purpose:
--------

This program was primarily developed to fix the bonds, angles and
proper dihedrals that are left "empty" (ie, without parameters) in a
GROMACS topology file upon the creation of special bonds (specbonds).
When grompp reads a topology with empty bonded entries, it looks for
corresponding matching rules in the directives [ bondtypes ], [
angletypes ] and [ dihedraltypes ].  Such directives can be found in
the ff*bon.itp files, and eventually in other .itp files #included in
the topology (after its generation by pdb2gmx).  In fact, as far as I
understand, those directives are precisely intended to properly fix
the empty entries resulting from specbonds.  Thus, the specification
of new specbonds should in principle be accompanied by the
corresponding rules in such directives, which would presumably assign
all empty entries with the correct parameters.  However, there are two
problems with this approach.  The first problem is that empty entries
for improper dihedrals are not created by pdb2gmx (there is no simple
way of deciding in which cases an improper dihedral should be used),
and thus cannot be fixed by rules in [ *types ] directives.  The
second problem is that, since those rules are specified in terms of
atomtypes, there is easily the danger of making unintended
assignments.  For example, some unintended assignments are actually
done in the specbond between HIS and HEM defined in the default
specbond.dat: grompp wrongly assigns several angles and dihedrals
(involving the specbonds) that should actually be assigned null
parameters or simply be removed from the topology.  The present
program intends to solve those problems, using a set of user-defined
rules to fix a topology previously generated using specbond
information.  This means that, in addition to a specbond.dat file
defining the special bonds required for your system (usually placed in
your working directory to supersede the default one), you need to have
also one or more rules files (see below) defining all the special
terms associated with those bonds.

Because of the special treatment required for improper dihedrals (see
"Rules files" and "Algorithm"), the program currently supports only
the forcefields ffG43a1, ffG53a6 and ffG54a7; incidentally, it may also
work with other ones, in which case you should let me know.  As far as
I am aware, there is no dependence on the GROMACS version used to create
the topology; if you find any you should let me know.

It should in principle be possible to define everything in the rules
files and discard completely the use of specbonds, but that would
require a lot of code to identify and generate all angles and proper
dihedrals involving the new bonds; since all that work is already done
by pdb2gmx using the specbond information, it is simpler to fix the
resulting topology.  It would be also interesting to make the program
forcefield-independent, but that would require considerable work.


Distribution and installation:
------------------------------

This distribution consists of: (1) this README file; (2) the program,
a single executable AWK script named fix_topology; (3) a few examples
of rules files.

To run the program, just use its full pathname or, if you prefer, make
sure that its location is in your PATH environment variable.


Usage:
------

The program reads a topology file in GROMACS format (typically created
by pdb2gmx using specbond definitions), and one or more rules files
containing the rules to assign the intended bonded terms.  The command
line is:

  fix_topology topology_file rules_file(s)

The format of the rules files is described below.  The fixed topology
is written to the standard output.


Rules file:
-----------

The format of a rules file is as follows:

1. Any line not starting with "define", "bond", "angle" or "dihedral"
   is ignored, thus working as a comment.  The best practice is to
   leave such non-processed lines blank or start them with a typical
   comment character, like "#" or ";".

2. The columns in the processed lines must be separated by one or more
   spaces and/or tabs (as the default field separator in AWK).

3. A line starting with "define" contains the definition of a macro,
   which must be a sequence of alphanumeric characters (and "_")
   enclosed between "(:" and ":)".  The macro "(:aa:)" is already
   defined and refers to any of the 20 common aminoacids.  A
   definition may use previously defined macros, such as:

       define   (:ee:)   ((:aa:)|DAP)
       define   (:dd:)   ((:ee:)|AMB|LYR)

   The macros can then be used in the non-"define" processed lines
   (those starting with "bond", "angle" or "dihedral").

4. The columns in the non-"define" processed lines are:

     - 1st column         : Type of term ("bond", "angle" or "dihedral")

     - 2nd column         : Function type used by GROMACS.

     - middle columns     : Atoms (2 for bonds, 3 for angles, 4 for
                            dihedrals), in the form
                            residuename_atomname, which may include
                            AWK-type regular expressions.

     - remaining columns  : Parameters, usually a GROMACS #defined
                            macro (eg, gb_15), or the word "delete".
                            See next point for improper dihedrals.

5. The rules for improper dihedrals are slightly different, and
   restrict the program to support only some forcefields (currently
   ffG43a1, ffG53a6 and ffG54a7).  First, the dihedral is identified as
   improper when the function type (2nd column) has the value "2", as
   in the supported forcefields.  Second, the lines cannot end with
   explicit parameters nor with the word "delete", but rather with one
   of the following macros (equal or derived from macros in the
   supported forcefields):

     - gi_1c : "Central" variant of the gi_1 macro for planar groups,
               for the case in which the first atom is bonded to the
               other three (bonds 1-2, 1-3, and 1-4), as in a carboxyl
               group.

     - gi_1s : "Sequential" variant of the gi_1 macro for planar
               groups, for the case in which the atoms are
               sequentially bonded (bonds 1-2, 2-3, and 3-4), as in an
               aromatic ring.

     - gi_2  : The gi_2 macro for tetrahedral groups, in which case
               the atoms are always centrally arranged (bonds 1-2,
               1-3, and 1-4).

     - gi_3  : The gi_3 macro for the planar heme group (with a force
               constant larger than gi_1), in which case the atoms are
               always centrally arranged (bonds 1-2, 1-3, and 1-4).

   This type of specification is needed to properly create the
   improper dihedral entries in the topology (see "Algorithm").

6. Equivalent definitions for bonds, angles and torsions are done
   automatically: eg, if the bond A-B is defined, the bond B-A does
   not need to be defined.


Algorithm:
----------

When an empty entry is found for a bond, an angle, or a torsion (ie, a
proper dihedral), the program does the following: (1) if no matching
rule is found, it leaves the empty entry and writes a warning; (2) if
a "delete" matching rule is found, it deletes the line; (3) if a
non-"delete" matching rule is found, it fills the entry with the
parameters or macro specified by the rule; (4) if more than one matching
rule is found for an empty entry, a warning is written to alert that
multiple rules for a given entry have been found. Still, an entry filled
with the parameters or macro is written for each matching rule. This
is because, even though the assignment of more than one bonded term of
the same type is uncommon, in ffG54a7, the phi and psi dihedral angles
have two parameters defined for each. Because of that, ambiguous rules
are permitted in this version, even though a warning is always issued
to prevent misuse of macros and other errors.

A different approach has to be used to deal with improper dihedrals,
because pdb2gmx does not generate empty entries for improper
dihedrals, as noted in "Purpose".  First, the program stores all the
bonds between atoms.  Then it goes over all the rules for improper
dihedrals and, for each rule, looks for atoms which are bonded as
implied by the rule, assigning the corresponding improper dihedral
when a match is found; note that the order of the atoms is necessarily
implied by the parameter macro specified in the rule in a
forcefield-dependent way (see point 5 in "Rules files").  The new
entries for improper dihedrals are placed at a suitable place in the
topology file (currently after the end of the "[ angles ]" block),
with their own header.


Tips on using the program:
--------------------------

It is advisable to have a separate rules file to fix the specbonds
associated with a particular group (one for each type of heme group,
one for each type of FeS center, etc).  The most direct way to build a
new rules file is to run the program using an empty rules file, and
then trim the warning messages and add the intended parameter (or
macro) specification for each one.  The rules obtained in this way do
not use regular expressions: you have one line for each arrangement of
residuename_atomname items.  In some cases much less rules will be
needed if you use regular expressions for the residuename_atomname
strings.  Thus, you can save a lot of tedious work if you use regular
expressions and/or even macros (see "Rules file").  However, the use
of regular expressions may be tricky and lead to errors, so do *not*
use them unless you really know what you are doing!  Note also that
the regular expressions are interpreted using the extended syntax used
by AWK and EGREP, not the basic syntax used by GREP. Be careful *not*
to define rules that correspond to chemically equivalent definitions,
specially when they involve residues of the same type (eg, as in
dissulfide bonds); if you do, a warning will be issued.

If you think you found a bug in fix_topology, please let us know.


References:
-----------

The program was first used to automatically fix the heme groups in a redox
protein topology periodically created on-the-fly during constant-pH MD
simulations [1]. The corresponding rules file (Heme_c3.rules) is given in
the EXAMPLES directory.

The program was later used for fixing the topology of branched peptide
dendrimers [2-5], which uses more advanced features (e.g., macros). So, if
we want to understand the full capabilities of fix_topology, you may want
to have a look at those references and at the rules file
(Dendrimer_53a6.rules) given in the EXAMPLES directory.

[1] M Machuqueiro, AM Baptista (2009) J Am Chem Soc, 131:12586.
doi.org/10.1021/ja808463e

[2] LCS Filipe, M Machuqueiro, AM Baptista (2011) J Am Chem Soc, 133:5042.
dx.doi.org/10.1021/ja111001v

[3] LCS Filipe, M Machuqueiro, T Darbre, AM Baptista (2013) Macromolecules,
46:9427.  dx.doi.org/10.1021/ma401574b

[4] LCS Filipe, SRR Campos, M Machuqueiro, T Darbre, AM Baptista (2016) J
Phys Chem B, 120:10138.  dx.doi.org/10.1021/acs.jpcb.6b05905

[5] LCS Filipe, M Machuqueiro, T Darbre, AM Baptista (2016) J Phys Chem B,
120:11323.  dx.doi.org/10.1021/acs.jpcb.6b09156

---------------------------------------------------------------------------
