# INSTRUCTIONS TO ANALYZE TRAJECTORY DATA
#
# Although this file is written as a shell script, you should execute each
# command individually on the terminal and check the output at each step,
# reading the corresponding comments and trying to understand what you are
# doing.

# This file illustrates the manipulation of CpHMD trajectories with a very
# simple example, namely the extraction of a protein-only PDB trajectory.
# You can adapt it for using other GROMACS analysis tools. Its main loop is
# written in such a way that allows to process only the blocks that already
# exist but are still unprocessed (so that you can run it while the CpHMD
# simulation is still running).


############################################################################
# 1. Define dirs (you probably need to change these for your system):
gmxbin=/gromacs/gromacs-4.0.7/bin
CpHDIR=/data/simulation/programs/CpHMD/ST-CpHMD-v4.1_GMX4.07
# Note that the GROMACS version doesn't have to be the modified one
# (gromacs-4.0.7_pH_I) that was used to run the CpHMD simulations, since we
# need only to process trajectories.


############################################################################
# 2. Process trajectories:

# When using GROMACS tools with CpHMD output you should _always_ provide
# the aminoacids.dat file or an appropriate index file, in order to make
# the new residue building blocks recognizable to GROMACS.
ln -sf $CpHDIR/top/aminoacids.dat .

# Create a temporary dir (if it doesn't already exist) to collect
# trajectories:
mkdir -p tmp

# The following loop processes all the already existing but still
# unprocessed blocks. So, you can run it just once after all CpHMD blocks
# were finished or, alternatively, run it multiple times as more blocks are
# gradually finished (in which case it may be more practical to run this
# file as a script, entering '. INSTRUCTIONS'). It can process up to 100
# blocks, although only 5 blocks (corresponding to 5 ns) will be processed
# for the present tutorial.
for (( i=1 ; i<=100 ; i++ )); do

    j=$(printf "%03d\n" $i)    # formatted block index

    # If this block's final .gro already exists but the corresponding
    # extracted trajectory doesn't exist yet, process the block:
    if [ -f ../../CpHMD/lyso_$j.gro -a ! -f tmp/traj_${j}.pdb ]; then
	# Extract a PDB trajectory after fitting the protein CA atoms to
	# the scructure obtained after the energy minimizations (in file
	# ../../init/i50.tpr), writing only the protein and skipping every
	# 10 frames (write the 1st, 11th, 21st, etc frames):
	printf "3\n1\n" | \
	    $gmxbin/trjconv \
		-f ../../CpHMD/lyso_$j.xtc \
		-s ../../initial/i50.tpr \
		-o tmp/traj_$j.pdb \
		-fit rot+trans -skip 10
    fi

done &> log.err


############################################################################
# 3. Remove "empty" hydrogens from trajectories:

# Remember that during a CpHMD simulation all tautomeric hydrogens are
# kept, charged or uncharged ("empty"), while charges and other force
# field parameters are changed according to the selected protonation
# state. To get cleaner PDB files for visualization, you may want to
# remove, from each trajectory frame, the tautomeric-specific
# hydrogens that were "empty" when that frame was generated during the
# CpHMD simulation; this can be done with the statepdb tool (see the
# header of the statepdb script). In order to do that, you need to
# define a one-to-one correspondence between structures and
# protonation states, as discussed in statepdb's header.
#
# In the present case, the CpHMD simulations were run using the following
# parameters:
#
# - in .mdp file:
#   - dt = 0.002                      : 1 step = 0.002 ps
#   - nstxtcout = 500                 : writes to .xtc every 500 steps = 1 ps
#
# - in .pHmdp file:
#   - EffectiveSteps = 1000           : 1000 steps = 2 ps per cycle
#   - EndCycle - InitCycle + 1 = 500  : 500 cycles = 1 ns per block
#
# The protonations are assigned at the beginning of each cycle and written
# once per cycle (every 2 ps) whereas the generated structures are written
# twice per cycle (every 1 ps).  Thus, the originally written .occ
# protonation states and .xtc structural frames can be paired as follows:
# the 1st protonation state to the 1st and 2nd frames, the 2nd protonation
# state to the 3rd and 4th frames, etc. Since we will be using the
# protein-only PDB trajectories, where every 10 frames were skipped, we
# should skip the protonation states every 5. The generated sequence of
# (frame,protonation) pairs will then be: (1st,1st), (11th,6th),
# (21st,11th), etc.

# Temporarily collect PDB frames:
cat tmp/traj_*.pdb > tmp-all.pdb
# Temporarily collect protonation states, skipping every 5:
awk 'NR%5==1' ../../CpHMD/lyso_*.occ > tmp-all.occ
# Run statepdb to remove "empty" hydrogens:
$CpHDIR/tools/statepdb \
    r tmp-all.pdb tmp-all.occ ../../CpHMD/lyso.sites ../../initial \
    > all-clean.pdb
rm tmp-all.{pdb,occ}

# If you now visualize the resulting all-clean.pdb trajectory in a
# molecular visualization program (Pymol, VMD, etc), you will notice that
# the number and/or chemical connectivity of actually-present hydrogens
# vary along the trajectory for some titrable sites.

############################################################################
# 4. Other analyses:

# You can adapt the loop used above (see point 2) in order to perform any
# other analyses directly from the .xtc or other GROMACS-format files.
# Note also that, although the above pairing of structure frames and
# protonation states was used for cleaning "empty" hydrogens (see point 3),
# it can be useful to analyze in more detail the relation between structure
# and protonation.

