
                 *** LandscapeTools , version 2.3 ***

This package reads a set of points in space and computes the probability
density and corresponding free energy in that space, identifying the
resulting energy basins and the transitions between them.  It was
originally intended for analysis of PCA data in conformational analysis,
but it can be useful for other purposes.

It consists of two programs: GETDENSITY computes the density map and
GETBASINS determines the basins.

GETDENSITY: This program generates probability density maps that can be
  read and interpreted with GETBASINS. It can also be used standalone,
  working as a kernel density estimator.

GETBASINS: This program is meant to be used with GETDENSITY output
  files. It assigns points in a landscape to basins, calculates the energy
  minima and the free energy of the basins, analyses the inter-basin
  transitions, and produces some visualization scripts for Gnuplot and
  PyMOL.


---------------------------------------------------------------------------
ALGORITHM

1. Compute a kernel estimate of the probability density of atoms.  This
   density is computed both at the bins of a grid and at the locations of
   the points read from the input file.  The density is properly
   normalized, so that its integral over the whole space should give 1
   (within the intended precision).  The kernel function may be Gaussian,
   triangular or naive.  For details, see ref. [1]. This is done by
   GETDENSITY.

2. Compute the `energy' in this space directly from the density,
   assuming a Boltzmann distribution, using as the zero of energy the
   point located at the highest density.  That is, the energy at a
   location r is given (in RT units) by

               E(r) = - log[ P(r) / Pmax ]

   Again, the energy is computed both at the grid bins and at the points.
   Of course that, strictly speaking, this `energy' is really a potential
   of mean force which implicitly takes account of all hidden degrees of
   freedom of the system, and its exact meaning depends on the physical
   nature of the points read from the input file.  This can be done by
   GETDENSITY and GETBASINS.

3. Map each location r in space to an energy basin centered `around' an
   energy minimum (one basin per minimum), using a steepest descent
   procedure.  Since the steepest descent path from a location r leads to a
   single energy minimum, this procedure splits the whole space into a
   collection of disjoint sets, the basins (for a discussion of this and
   other basin definitions see refs. [2,3]).  Although this is a
   well-defined analytical procedure, it is done here in a grid-based
   manner, following the steepest descent path along contiguous bins, which
   defines a collection of grid-based basins.  After this, each point is
   assigned to the basin of the nearest bin. This is done by GETBASINS.

Note that, in all these operations, the coordinates used for the grid
correspond to the center of the grid bins and not to the grid nodes
(i.e., the vertices of the bins). 


---------------------------------------------------------------------------
USAGE

GETDENSITY:

Usage: getdensity [options] datafile ndim meshsize runname 
Options:
      -h bandwidth : Default = Silverman's rule of thumb.
      -s sensitivity : Default = 0.
      -k kernel : gauss, triang or naive. Default = triang.
      -F : Writes runname.dat file with point data; getbasins requires
           this file.
      -E : Writes runnameE.fld file (with energies); getbasins does *not*
           require this file.

datafile : A file with point coordinates is the only input required by
  GETDENSITY. Each line corresponds to a point and contains one or more
  space-separated columns (one per coordinate). The file can contain more
  columns than the number of dimensions to be analysed (ndim).

ndim : Number of dimensions to be analysed. If inferior to the number of
  provided columns, only the first ndim columns are used.

meshsize : The grid meshsize.

runname : The name (without suffix) to be given to the output files.

For the understanding of kernel, bandwidth, sensitivity, and Gaussian,
triangular, or naive kernels, check ref. [1].


GETBASINS:

This program is meant to be used after GETDENSITY, using as input some of
its output files (runname.fld and runname.dat).  It requires also the input
file originally used by GETDENSITY (so don't (re)move it from its original
location).

Usage: getbasins runname [cutoff]

runname : The name (without suffix) of the input and output files. It must
  be the same used with GETDENSITY.

cutoff : Optional energy cutoff in RT units. Points with energy above this
  cutoff will be excluded from the basins. In any case, points above
  MAX_ENERGY/2 are never assigned to a basin, where MAX_ENERGY is a #define
  macro (currently 1e6).


---------------------------------------------------------------------------
OUTPUT


GETDENSITY:

- An fld AVS file is written with the probability density (runnameP.fld).
  Note that the fld variant used here is a mix of ASCII (for the header)
  and binary (for the field values).

- With option -F, runname.dat is written, which contains point data (index,
  bin, energy). This file is essential to continue analysis with GETBASINS.

- When ndim is 2, file runnameP.gp is written with correct format to draw a
  3D plot of the probability density landscape with gnuplot software.

- With option -E, file runnameE.fld (and runnameE.gp, when ndim=2) is also
  written, with the free energy instead of the probability density.


GETBASINS:

For simplicity of analysis, basins are labeled with a number using the
order of their free energies (those in runname.thermo), so that basin 0 is
the most populated one, basin 1 the second most populated, and so on.
These labels are used in all output files.  The information for the basins
is written in several different ways:

- A runname.ndx file is written for Gromacs use, indicating the point
  numbers that belong to each basin.

- A runname-min.pdb PDB file is written, where each point corresponds to
  the minimum of one basin (ie, one point per basin). The point number is
  written in the atom-number field, the basin number in the residue-number
  field, the values of its first coordinates in the x,y,z fields, and its
  energy in the B-factor field. See also the Note below.

- Several runname-bas*.pdb PDB files are written, each containing the
  points included in that particular basin.  Only points below the given
  cutoff are written (if no such point exists in a basin, the corresponding
  runname-bas*.pdb is not even written). Fields in these files are as
  indicated above for runname-min.pdb. See also the Note below.

- An ASCII file runname.thermo is written, containing a table with the free
  energy, mean energy, entropy, minimum energy of each basin, and
  percentage of points in that basin.  The free energy is directly computed
  from the number of points in each basin

- An ASCII file runname.traj is written containing simply the basin
  occupied along the trajectory (one line per point). Points with very high
  energy are not assigned to a basin, being marked as "HIGH-ENERGY" in the
  .traj file.

- A PyMOL script runname.pml file is written, containing the set of PyMOL
  commands necessary to load and visualize all basins and their minima in
  the first 3 dimensions, as well as some energy contours (in RT units).
  In addition to the different colors assigned to each basin, you can color
  points according to their energy, since these were written in the
  B-factor column of all PDB files (runname-min.pdb and
  runname-bas*.pdb). For better visualization, nonbonded_size and
  sphere_scale can be adjusted (see top of runname.pml). See also the Note
  below.

- Two files are written for visualizing a simplified scheme of the relation
  between basins: runnameS.pdb and runnameS.pml.  When runnameS.pml (which
  reads runnameS.pdb) is loaded into PyMOL, it displays a simplified scheme
  where each basin is represented in the first 3 dimensions as a sphere
  whose volume is proportional to the corresponding probability.  When
  transitions have occurred between a pair of basins (ie, a change in
  runname.traj), a stick is drawn between the corresponding spheres, using
  a stick radius proportional to the number of transitions (no distinction
  is currently being made between back and forth transitions).  This
  results in a simple scheme where we can easily identify the most
  populated basins and the most frequent transitions between them.  For
  better visualization, sphere_scale and stick_scale can be adjusted (see
  top of runnameS.pml).  Note that you can also color the spheres (and the
  attached half-sticks) according to the basin free energy (in the spheres
  B-factor); this actually conveys the same type of information as the
  sphere size, but it may help visualization. See also the Note below.

Note: When there are more than 99999 input points, the PDB files
  runname-bas.pdb and runname-min.pdb become ill-formatted, because the
  point number no longer fits in the 5-character field; a warning is
  written to stderr. A possible work-around is to transform those files
  into PQR files (whose fields are space-separated), e.g., using

  awk '{i=length(substr($0,7,5)+0)-5;if(i<0)i=0;print substr($0,1,38+i) " " substr($0,39+i,8) " " substr($0,47+i)}' runname-min.pdb > runname-min.pqr

  The PyMOL visualization scripts would then have to be changed
  accordingly.  In particular, since the last column is no longer
  interpreted as the atomic B-factor, but rather as the atomic radius, one
  must use the elec_radius property to access the energy values given in
  that column; e.g., to color by energy use "spectrum elec_radius,
  blue_red".


---------------------------------------------------------------------------
COMPILATION

Example:
gcc getdensity.c -o getdensity -O3 -lm -Wall -W -pedantic -ansi
gcc getbasins.c -o getbasins -O3 -lm -Wall -W -pedantic -ansi


---------------------------------------------------------------------------
EXAMPLE

3dproj.pdb is an output from g_anaeig tool of GROMACS (option -3d). 

awk '/ATOM/{print $6,$7,$8}' 3dproj.pdb > 3D-in


To analyse a 2D landscape:

getdensity -k gauss -F -E 3D-in 2 0.2 2D &> 2D.out
getbasins 2D 4.0 &> 2D-b.out

To plot the 2D landscape inside gnuplot:
gnuplot> set pm3d
gnuplot> set view map
gnuplot> set cbrange [0:10]
gnuplot> splot "2DE.gp" w pm3d


To analyse a 3D landscape:

getdensity -k gauss -F -E 3D-in 3 0.2 3D &> 3D.out
getbasins 3D 4.0 &> 3D-b.out

To plot the 3D landscape inside PyMOL:
pymol 3D.pml

For schematic representation of basins and transitions inside PyMOL:
pymol 3DS.pml


Tip: For 2D or 3D analyses choose a meshsize that is around 1/100 of the
largest coordinate dispersion (e.g., the 1st principal component).


---------------------------------------------------------------------------
CITATION

If you use LandscapeTools, please cite ref. [2].


---------------------------------------------------------------------------
REFERENCES

[1] Silverman (1986) "Density estimation for statistics and data analysis",
Chapman & Hall/CRC.

[2] Campos et al. (2009) J. Phys. Chem. B, 113:15989.

[3] Becker and Karplus (1997) J. Chem. Phys. 106:1495.


---------------------------------------------------------------------------
