=========================================================================
           MCRP : Monte Carlo for Reduction and Protonation

                     www.itqb.unl.pt/simulation
=========================================================================

Authors: Antonio M. Baptista
--------


Citation:
---------

If you use MCRP in your work please cite the following paper:

  Baptista, A.M., Martel, P.J., Soares, C.M. (1999) "Simulation of
  electron-proton coupling with a Monte Carlo method: application to
  cytochrome c3 using continuum electrostatics", Biophys. J., 76,
  2978-2998.

Please report bugs to baptista@itqb.unl.pt.


Installation:
-------------

This distribution of MCRP (eg, tar file) should include the following files:

  - README        : This file, which contains the documentation for MCRP.
  - mcrp.c        : C source code of the program.
  - Makefile      : Makefile for compilation, etc.
  - sample.pkint  : Input sample file with pKint values and other info.
  - sample.g      : Input sample file with interaction energies.
  - sample.out    : Output sample file.
  - EXAMPLE       : Info for sample data.
  - LICENSE       : The distribution license

The source code of MCRP consists of a single file, mcrp.c (which contains
lots of global variables and other ugly stuff!). Only standard C functions,
libraries, etc, are used (I think...), and therefore the compilation should
be straightforward. For compilation, just run 'make'.


General description:
--------------------

MCRP is a program to simulate the binding equilibrium of a set of
protonatable and redox sites, the sampling of binding configurations being
done using a Monte Carlo (MC) method; the theoretical aspects are discussed
elsewhere [1]. Although the program assumes that each binding site has only
two possible states (see below), it may be used to address a system of
multi-state sites by doing the sampling in a thermodynamically equivalent
system of two-state sites [2].

MCRP requires as input a set of intrinsic (individual) site affinities and
a matrix of site-site interactions, which have to be computed using some
other method. As shown in ref. [1], the formalism for the binding of
protons and "oxidons" is essentially the same as for protons alone, so that
a program which computes the energetics for the protonatable case can be
easily made to do the calculations involving the two types of site: one
just needs to treat the redox sites as being oxidizable instead of
reducible, ie, as binding "oxidons" instead of electrons. Thus, the format
of the input files for MCRP (described below) is close to that produced by
the `multiflex' program of Donald Bashford's package MEAD [3,4], a freely
available package for computing intrinsic affinities and pairwise
interactions for a set of protonatable sites. Nevertheless, the input data
can be obtained from any source, as long as the free energies of binding
are assumed to be decomposable in individual site and pairwise site-site
terms [1].

Using the intrinsic affinities and pairwise interactions given as input,
MCRP does an MC calculation for each of the points of a grid of
pH-versus-potential specified in the command line (see below). Each MC run
computes several binding statistics (some optional): mean occupations,
fluctuations, correlations, occupational entropies, errors,
"time"-correlations, and the discrimination of binding populations within a
given set of sites (see Options below). It is also possible to fix the
state of individual residues or restrict the type of flip (single or
double) that changes its state (see Input below).


Input:
------

The file with the intrinsic affinities, called pkint_file (from the
protonatable case) has essentially the same format as the .pkint file
created by MEAD. The only differences are a new one-letter-code for redox
sites and an extra column which specifies whether a site is titrable or has
a fixed state. The file should have one line per site, each of the format:

<pkint>  <charge>  <label>  [<type>  [<tit>]]

The last two fields are optional, but <tit> requires <type> to be
present. The <pkint> value is assumed to be in pH units, even for
oxidizable sites; this is the way MEAD will compute it. The <charge> field
indicates if the charged form of the site is anionic (A) or cationic
(C). The <type> field indicates if the site is protonatable (P) or redox
(R); default is P. The field <tit> indicates if the titration of the site
proceeds through single and double flips (*), through double flips only
(d), through single flips only (s), or if the site is non-titrable and
should be kept empty (0) or occupied (1); default is *. (Empty is
deprotonated or reduced, while occupied is protonated or oxidized.) The
fields are separated by one or more spaces (_not_ tabs!).


The file with the pairwise interactions has the same format as the .g file
created by MEAD:

<site#> <site#>  <interaction>

The first two columns are actually ignored by MCRP, and are read only for
compatibility with MEAD; MCRP generates its own site indexing (which starts
at 0, _not_ at 1). The <interaction> value is assumed to be in units of
proton squared per Angstrom (e^2/A), for compatibility with the default
output of MEAD. The fields are separated by one or more spaces (_not_
tabs!).


Output:
-------

MCRP uses a single output file where all information is written. In my
opinion this simplifies the analysis of the data, because many different
combinations of data types may need to be analyzed. In order to simplify
the processing through grep and awk lines/scripts, each line starts with a
character indicating its type of information. The information in this
output file is written in a somewhat redundant way, but the file is highly
compressible (80-90% using gzip).

Lines starting with '#' contain general information, such as the input
parameters and headers for blocks of data. They do not follow any fixed
format.

Lines starting with '.' contain information on individual sites for a given
pair of electrostatic potential (E) and pH values. Their format is:

.  <E> <pH>  <site#>  <mean-occup>  [<err-mean> (<tcorr>)]  [<dG> <dH> <TdS>]

The error of the mean occupation and the correlation time (tcorr) of the
occupations are computed optionally when option -t is used (see below). The
thermodynamic quantities refer to the binding reaction for the individual
site and correspond respectively to the free energy, enthalpy +
configurational entropy, and occupational entropy (see ref. [1] for
details). If <site#> is equal to 'totP' or 'totR' the line refers to the
total occupation of the protonatable or oxidizable sites, respectively; in
this case the format is slightly different:

. <E> <pH>  <site#>  <mean-occup> <stdev-occup>

where the standard deviation refers also to the total occupation.  (Note
that the standard deviations of the occupations of individual sites are
directly obtained from the mean: stdev=mean-mean*mean.)

Lines starting with 'P' or 'R' contain the bins for histograms of the total
occupation of protonatable or oxidizable sites, respectively, for a (E,pH)
pair. The formats are:

P <E> <pH>  <bin[0]> <bin[1]> ... <bin[total-of-P-sites]>

where <bin[i]> is the number of occurrences with a total of i occupied
sites of type P or R.

Lines starting with 'e' contain the mean and standard deviation of the
total energy, for a (E,pH) pair. The format is:

e <E> <pH>  <mean-energy> <stdev-energy>

These lines are written only when the option -e is used (see below).

Lines starting with ':' contain the correlation coefficients between the
occupations of pairs of sites for a (E,pH) pair, and are produced by the
option -p (see below). The format is:

: <E> <pH>  <site1#> <site2#>  <corr-coef>  <mean-occup1*occup2>

The last column contains the mean of the product of the occupancies of
both sites (usefull for some calculations).  The first site number is
always lower than the second, ie, pairs are not repeated. A line with
the correlation of the total protonation and oxidation is always
written (with site numbers 'totP' and 'totR'), even if option -p is
not used.

Lines starting with 'm' contain microstate statistics for a set of selected
sites, for a (E,pH) pair, and are produced by the -S option (see
below). The format is

m <E> <pH>  <binding-configuration>  <fraction>

The <binding-configuration> has one digit per site (in the order given to
the option -S), which is 0 or 1, depending on whether the site is empty or
occupied. The <fraction> is relative to the total of possible
configurations for the set of sites. Thus, for a set of N sites there will
be 2^N lines per (E,pH) points, and their <fraction>s sum up to unit.

The lines starting with '>' are only produced after all pH values have been
run for a given value of electrostatic potential (E). The format is:

>  <label>  <site#>  <E>  <pKint> <pKhalf(s)>

The <label> and <pKint> are those read from the pkint_file (see above). The
<pKhalf(s)> may be one or several values, in case the site has a
non-monotonic behavior in the mid-point region. These lines are written
especially for the case of protonatable sites only. (Note that Ehalf values
are _not_ computed by MCRP).

A line starting with 'f' is written at the very end of the run, containing
the final states of all sites. The format is:

f <state[0]> <state[1]> ... <state[nsites-1]>

This information may be useful to interface MCRP with other programs.


Options:
--------

  -P pHmin,pHmax,dpH  : pH range and increment.

  -E Emin,Emax,dE     : Electrostatic potential range and increment (mV).

  -T temperature      : Temperature (Kelvin).

  -c couple_min       : Couple threshold for double flips (pH units).
                        This means that sites whose interaction is >=
			couple_min will be subjected to double flips
			during the MC scheme, as done by Beroza et al
			[5].

  -r cutoff           : Cutoff for use in reduced titration [6]. This
                        means that if the minimum of the average
			occupation is > 1 - cutoff the site is
			considered occupied, while if the maximum of
			the average occupation is < cutoff the site is
			considered empty; see eq. (19) and following
			paragraph in ref. [6]. If cutoff=0 reduced
			titration is not used.

  -q eqsteps          : Number of MC equilibration steps before the
                        production run starts. 

  -s seed             : Seed for the random number generator.

  -p min_corr         : Cutoff for printing pair correlation
                        coefficients. Only values >= min_corr in
			absolute value will be printed. Note that the
			time spent when -p is used is not affected by
			its argument.

  -e                  : Compute site energetics. This gives the
                        occupational entropy and other terms for the
			binding free energy of each site [1].

  -S site1,site2,...  : Set of sites for microstate statistics. This
                        means that the relative populations of all
			binding configurations for this set of sites
			will be printed.

  -t taumax           : Maximum correlation time. This switches on the
                        calculation of time-correlation functions for
			the occupancies of individual sites, and from
			it the errors of the mean occupancies.

  -d                  : Shows defaults (if alone) and other info, such
                        as the program version and comments on option
                        effects.


References:
-----------

[1] Baptista, A.M., Martel P.J., Soares, C.M. (1999) "Simulation of
    electron-proton coupling with a Monte Carlo method: application to
    cytochrome c3 using continuum electrostatics." Biophys. J., 76,
    2978-2998.
[2] Baptista, A.M., Soares, C.M. (2001) "Some theoretical and computational
    aspects of the inclusion of proton isomerism in the protonation
    equilibrium of proteins." J. Phys. Chem. B, 105, 293-309.
[3] Bashford, D., Karplus, M. (1990) Biochemistry, 29, 10219-10225.
[4] Bashford, D., Gerwert, K. (1992) J. Mol. Biol., 224, 473-486.
[5] Beroza, P., Fredkin, D.R., Okamura, M.Y., Feher, G. (1991)
    Proc. Nat. Acad. Sci., 88, 5804-5808.
[6] Bashford, D., Karplus, M. (1991) J. Phys. Chem., 95, 9556-9561.

