
                   *** FixBox, version 1.2 ***

This program implements the FixBox algorithm described in ref [1], which
fixes a molecular system that might have been "broken" during a simulation
in a box with periodic boundary conditions (PBCs).

---------------------------------------------------------------------------
USAGE:

The program reads a file in GRO format and, for each frame, assembles and
centers the system in the periodic box, using information provided in an
input file with molecular definitions (designated here as moldef file), and
writes to stdout the fixed frames of the system in GRO format.

Example of a command line to run the program:

    fixbox system-config.gro definitions.def > fixed-system-config.gro

The GRO file given as input ('system-config.gro' in the example) can
contain multiple configurations of the system (it can be a trajectory),
whose fixed counterparts are written to stdout in the same order (note the
redirection).

The moldef file ('definitions.def' in the example) is read only after the
first frame and contains the definition of all the necessary groups of
molecules (which together must define *all* molecules), as well as the
indication of the groups to be used for assembling, centering and placement
into the box (see 'MOLDEF FORMAT' below).  The part containing the group
definitions can be regarded as a simple substitute for a molecular
topology; not depending on a molecular topology can be useful in some
cases, as pointed in ref [1].

---------------------------------------------------------------------------
ALGORITHM:

The FixBox algorithm is explained in detail in ref [1]. Briefly, for each
frame, the following sequential algorithm is applied:

1. Determine distances between molecules, using their closest atoms.

2. Assemble the groups of interest by molecular proximity, gradually adding
   the closest molecules, one at a time.  More than one stage can be used,
   for greater flexibility.

3. Center the groups of interest along each direction, using the total
   extent.

4. Place into the box specific groups along each direction, which generates
   the final fixed configuration.

Boxes are always treated as triclinic (with box vectors a, b and c) and
satisfying PBCs.  Except for step 1, which is done in physical space, all
other steps are performed in the corresponding dimensionless unit cubic box
in scaled space.  See ref [1] for details.

---------------------------------------------------------------------------
MOLDEF FORMAT:

As discussed in ref [1], FixBox consists of a sequence of transformations
performed on sets of molecules that must be defined by the user for the
system of interest; in the program context, those sets are designated as
'groups'.  Since FixBox does not use a molecular topology, molecules are
defined when they are first assigned to a group (through the lines starting
with 'a' or 'n'; see below).  Note that a molecule can be assigned to
multiple groups.

The processed lines of the moldef file must start with one of the following
characters (which usually occur in the file in this order):

- G : A group name definition, consisting of a string (without spaces).
  Line examples: "G Protein", "G Lipids".  The group 'None' is predefined
  and reserved, corresponding to a group with no molecules that can be used
  in other definitions (see below).  A group definition is followed by a
  sequence of lines indicating the molecules it contains, each of which may
  directly define specific molecules (lines starting with 'a' or 'n') or
  refer to molecules already included in other groups (lines starting with
  'g').  In terms of set theory, the group is the union of the molecules
  specified through those lower-case-starting lines, described below.

- a : A molecule defined by the *ordinal* range of its atoms.  Line
  example: "a 1000 1500", which means that a molecule starts at the 1000th
  atom and ends at the 1500th one, regardless of their actual numbers in
  the GRO file.  The molecule is assigned to the currently defined group
  (as defined by the last G-line).

- n : One or more molecules defined by the residue name.  Line example: "n
  DMPC" designates *all* molecules with residue name DMPC.  The molecules
  are assigned to the currently defined group.

- g : One or more molecules defined through a previously defined group (not
  the current one).  Line examples: "g Protein", "g Lipids".  This is
  useful to define larger groups (e.g., a group System containing the
  groups Protein and Lipids).  The molecules are assigned to the currently
  defined group.

- A : Specify a group to be used (sequentially) in an assembling stage,
  giving one line per stage (1st A-line for stage 1, 2nd A-line for stage
  2, etc).  Line examples: "A Protein", "A Lipids".

- C : Specify the three groups to be centered along each box vector
  (a,b,c), and the corresponding types of message, 'E' or 'W' (error or
  warning), to generate if the group exceeds the box along that direction.
  Line example: "C Protein Protein System E E W", which centers Protein
  along directions a and b, and System along direction c, giving an error
  if Protein exceeds the extent of a and/or b but allowing System to exceed
  the c extent (writing just a warning).  Together with the P-line, this
  E/W selection provides detailed control over centering and bringing into
  the box, avoiding to accidentally break groups intended to remain whole.

- P : Specify the three groups which should be brought into the box along
  each box vector (a,b,c), using PBCs.  Line example: "P System System
  None", which means bringing System into the box along directions a and b,
  but do nothing along direction c.

- Lines not starting with one of the above characters are ignored.

For clarity in the moldef file, it is suggested that: upper-case entries
are given in the above order (G,A,C,P); upper-case entries are separated
using blank lines, except A-lines (which should be consecutive); lower-case
entries follow immediately after their corresponding G definition, using
the above order (a,n,g); lines with comments are started using a character
commonly used for that purpose (e.g., '#').  See the examples in ref [1].

---------------------------------------------------------------------------
IMPORTANT NECESSARY CONDITIONS:

- The input coordinates *must* correspond to whole molecules (e.g.,
  generated with '-pbc whole' or '-pbc mol' by GROMACS).  This is *not*
  (and, without a topology, cannot be) checked!

- A molecule *must* be a set of consecutive atoms.  This is *not* checked!

- As noted above, a-lines in the moldef file should contain *ordinal* atom
  numbers.  Actual atom numbers in the GRO file are treated as mere string
  labels and reused only to output the fixed GRO stream.

- The moldef definitions should assign all atoms to a molecule and all
  molecules to at least one group.  Otherwise, the program reports an error
  and exits.

---------------------------------------------------------------------------
COMPILATION:

The program is written in ANSI C and its compilation is straightforward.
Examples:

  gcc fixbox.c -o fixbox -O3 -lm -Wall -W -pedantic -Wno-unused-result
  gcc fixbox.c -o fixbox -O3 -lm -Wall -W -pedantic -ansi

---------------------------------------------------------------------------
AUTHORS:

- Antonio M. Baptista, ITQB NOVA, Portugal

---------------------------------------------------------------------------
CITATION:

If you use FixBox, please cite ref [1].

---------------------------------------------------------------------------
REFERENCES:

[1] Baptista, A. M., da Rocha, L., Campos, S. R. R. (2022) FixBox: A
General Algorithm to Fix Molecular Systems in Periodic Boxes.
J. Chem. Inf. Model. 62:4435. https://doi.org/10.1021/acs.jcim.2c00823

