# Example: Tautomeric pKa calculations on lysozyme structure 4LZT

# This tutorial contains only some brief comments. For each program being
# used, you should check its usage, file formats, etc (see its README file,
# or header, etc). In particular, check MEAD's README file for the format
# of the files .pqr, .st, .sites, .ogm, .mgm. Check also the README file of
# meadTools.

# Although this file is written as a shell script, the idea is that you run
# each command individually on the shell terminal and check the output in
# each step.

##########################################################################
# DEFINITIONS:

# Program paths (most certainly different in your system):

# Gromacs, version 2018.3
gmx=/gromacs/gromacs-2018.3
# MeadTools, version 2.2
meadTdir=/programs/meadTools-2.2
# MEAD, version 2.2.9
meadbin=/programs/programs64/mead-2.2.9/bin/
# petit, version 1.6
petitdir=/programs/petit-1.6/
# ASC, version 2.14
ascdir=/programs/ASC_2.14/

# Other settings:

# inner dielectric constant (you may try different values)
eps=10 
# ionic strength (mol/L)
ionstr=0.1 
# temperature (K)
temp=300 
# number of cpus for running meadT
ncpus=4

##########################################################################
# BUILD .pqr FROM .pdb:

# Get the structure from the Protein Databank and uncompress it
wget http://www.pdb.org/pdb/files/4LZT.pdb.gz
gunzip 4LZT.pdb.gz

# Clean the PDB file:
#  a) Remove ANISOU records
#  b) Remove alternate (B) chains
#  c) Remove NO3 ions
#
egrep  -v "(ANISOU|^ATOM............B|NO3)" 4LZT.pdb > lyso.pdb
#
# NOTE: the crystalographic waters are retained for the calculation
#
# --- lyso.pdb is the processed pdb file

# The .pqr file will be generated using makeqpr, but first a Gromacs
# topology needs to be created:
#
$gmx/bin/gmx pdb2gmx -f lyso.pdb -o lyso.gro -p lyso.top  <<EOF
14
1
EOF
# NB: GROMOS96 54a7 force field was selected.

# In many cases we should be able to produce the required pqr file with the
# following command:
$meadTdir/makepqr W 2RT $gmx/share/gromacs/top/gromos54a7.ff/ffnonbonded.itp \
		  lyso.top lyso.gro
# but here it doesn't work, returning several warning/errors ("Warning:
# Number of atoms is 1324 in .top file and 1741 in .gro file"; "Warning:
# Unknown atom OW in residue HOH 1001"; etc). This is due to keeping the
# xtal waters in the pdb file. In this case we need to generate a processed
# topology for our structure with the Gromacs tool grompp:
touch dummy.mdp
# (an .mdp file is always required, even if empty)
$gmx/bin/gmx grompp -f dummy.mdp -p lyso.top -pp processed.top -c lyso.gro 
# Replacing lyso.top with processed.top in the previous makepqr command
# will work:
$meadTdir/makepqr W 2RT $gmx/share/gromacs/top/gromos54a7.ff/ffnonbonded.itp \
		  processed.top lyso.gro > lyso.pqr
# The .itp file can also be replaced by the processed topology as argument
# to makepqr:
$meadTdir/makepqr W 2RT processed.top processed.top lyso.gro > lyso.pqr
# Note that the .pqr file has titrable residues in their charged forms, and
# waters without protons.

##########################################################################
# PREPARE ADDITIONAL FILES FOR meadT (AND multiflex):

# Now the "outer" crystalographic water molecules are removed with
# selectWacc.  It requires a running ASC installation located in
# $ascdir.
$meadTdir/selectWacc lyso.pqr lyso_wat.pqr $ascdir
# The 139 water molecules in lyso.pqr are down to 85 "inner" waters in
# lyso_wat.pqr. Note that selectWacc has optional arguments (see its
# usage).

# At this point, lyso_wat.pqr should be loaded in a visualization program
# (e.g., PyMOL) and the protonation state of each site should be checked
# and corrected if needed (see section G.1 in $meadTdir/README).

# The sites file is now created with "makesites", using the "t" option for
# tautomer generation:
$meadTdir/makesites t lyso_wat.pqr > lyso_wat.sites
# There should be a total of 641 tautomeric sites for the 134 titrable
# groups in lyso_wat.sites.

# Now we get all the required .st files (as listed in the .sites file):
$meadTdir/getst lyso_wat.sites ./st-G54a7_Fit_and_others
# The directory ./st-G54a7_Fit_and_others contains .st files from
# $meadTdir/st-G54a7_Fit and (for ARG, SER, THR and HOH) from
# $meadTdir/st-G53a6. See the description of st directories in section E of
# $meadTdir/README.

# Add tautomeric protons to lyso_wat.pqr:
$meadTdir/addHtaut lyso_wat.pqr lyso_wat.sites > lyso_taut.pqr
# addHtaut assumed specific protonation states (see section G.1 in
# $meadTdir/README).

# Change all titrable sites to their charged reference state:
$meadTdir/statepqr r=c lyso_taut.pqr lyso_wat.sites > lyso_charged.pqr

# Apply an offset (e.g., 10000) to the numbering of titrating sites:
$meadTdir/stmodels 10000 lyso_charged.pqr lyso_wat.sites
# This assures that only the fragments indicated in the .st files will be
# used as model compounds and there won't be overlap of fragments. See the
# header of $meadTdir/stmodels for further details.


##########################################################################
# RUN meadT:

# Rename input files:
cp lyso_wat_stmod.sites lyso_meadT.sites
cp lyso_charged_stmod.pqr lyso_meadT.pqr

# We now need grid geometry files for both protein (.ogm) and model
# compound (.mgm). Both .ogm and .mgm use two successive grid levels to
# compute the electrostatic potential. See MEAD's README for details.

cat <<EOF > lyso_meadT.ogm
ON_GEOM_CENT 81 1.0
ON_CENT_OF_INTR 81 0.25
EOF

cat <<EOF > lyso_meadT.mgm
ON_GEOM_CENT 61 1.0
ON_CENT_OF_INTR 61 0.25
EOF

# Your .pqr and .st files must have the same atom names and order. This is
# a common source of error.

# Now we are ready to run meadT.
# meadT will produce two files at end of the run:
# - .g : file with interaction energies
# - .pkcrg : pKa values for each site when all other sites are charged (charged reference state)
$meadTdir/meadT -n $ncpus -b 250 -s $meadTdir -m $meadbin \
   -epsin $eps -ionicstr $ionstr -T $temp lyso_meadT \
   1> meadT.out 2> meadT.err
# meadT produces and deletes intermediate files using the runname
# lyso_meadT, so it is better to avoid using lyso_meadT to name other
# files.  This calculation may take around 5-10 minutes to finish.

# Remove intermediate files
rm -f *.potat

# Revert offset (done above with stmodels):
gawk -v off=10000 '
  {match($0,/(^.+-)([0-9]+)$/,a);print a[1] a[2]-off*(1+($3~/^NT/)+2*($3~/^CT/))}
' lyso_meadT.pkcrg > lyso.pkcrg

mv lyso_meadT.g lyso.g


##########################################################################
# RUN PETIT (MONTE CARLO SIMULATIONS):

# From the .g and .pkcrg files create the PETIT input file using the
# program cconvert (which must have been compiled first):
$meadTdir/cconvert/cconvert lyso.pkcrg lyso.g $temp lyso.dat
# There is also an AWK version of this program, called convert, but it is
# much slower. See section E of $meadTools/README for further info.

# Compute titration curve with PETIT (pH interval 1-14 in steps of 0.2).
# See $petitdir/README for details.
$petitdir/petit -H 1,14,0.2 -T $temp -c 2.0 -q 1000  100000 < lyso.dat \
                 1> lyso_petit.out 2> lyso_petit.err
# This calculation may take around 5 minutes to finish.


##########################################################################
# ANALYSIS:

# The lyso_petit.out file contains (among other info) partial occupancies
# of each site at each pH value, total protein charge and also pKhalf
# values for all sites titrating within the set pH interval. See
# $petitdir/README for details.

# Extract pKhalf values to a text file that can be used to make a table:
awk '/^>/{printf "%-12s %10s\n",$2,$5}' lyso_petit.out > lyso.pKhalfs

# Create directory for titration curves:
mkdir TIT

# Create files with the titration curves of individual sites
# (1st column = pH, 2nd column = average protonation):
awk 'BEGIN{while(getline<ARGV[1])if($0~/^>/)name[$3]=$2};
     /^\./ && name[$4]!=""{print $3,$5 > "TIT/" name[$4] ".tit"}' lyso_petit.out

# Create file with the total titration curve
# (1st column = pH, 2nd column = average protonation):
awk '$4=="totP"&&$1=="."{print $3, $5, $6}' lyso_petit.out > TIT/total_prot

# You can directly plot the titration curves using your favorite plotting
# program. For example, with gnuplot you can do:
gnuplot <<EOF
set term pdfcairo lw 1
set output "plots.pdf"

sites=system("awk '/^>/{printf \"%s \",\$2}' lyso_petit.out")

set xlabel "pH"
set ylabel "Average protonation"

# One plot with the total titration curve:
plot [1:14] "TIT/total_prot" title "total" with lines

# One plot for each individual titration curve:
do for [s in sites] {
    plot [1:14] [0:1] "TIT/".s.".tit" title s with lines
}

# One plot with all individual titration curves:
set key horizontal tmargin font "Sans,7" samplen 3
plot [1:14] for [s in sites] "TIT/".s.".tit" title s with lp ps 0.5
EOF
# This will create a file plots.pdf containing several plots with titration
# curves.

# End of tutorial.
