The command CHECK will bring you to the CHECK menu. This menu holds
options that all check one or more aspects of protein structures. Most
checks determine exceptional situations, like for example a contact
that is seldomly seen in the database, but also hard errors, like for
example a wrong SCALE matrix in a PDB file can be detected.
Several of the commands in this menu are also executable from another
menu. For example CHICHK evaluates and checks torsion angles. This
option can also be
called as EVACHI from the CHIANG (torsion angle) menu.
A few options in the check menu are so called 'terminal' options. That
means that they can destroy the status of the soup, and will definitely
leave WHAT IF in an undefined state after the option finished.
There is a complete description of the output of all checks on the World
wide web at:
http://www.sander.embl-heidelberg.de/rob/checkhelp/
Please have a look at those documents if anything is unclear.
These checks were written for X-ray structures, but most of them
work perfectly fine for NMR structures. Interpretation of the results might
be different, though, and our experience with NMR structures is limited.
Most checks will automatically generate a table of values for each individual
model if a multiple model NMR structure is checked.
The command FULCHK will cause WHAT IF to write a complete report about
a protein structure. You will get the output in LaTeX format in a file
"pdbout.tex", and in plain text format in "pdbout.txt". Obviously
pictures can only be given in the LaTeX output. If you want to use the
LaTeX output, you will need the latex program and some others. For
your convenience suitable versions of these programs are archived on
our anonymous ftp site "swift.embl-heidelberg.de" in the directory
"/whatif/support".
To use the LaTeX output, you can type:
latex pdbout (to reformat the file)
xdvi pdbout (to preview the output)
dvips pdbout (to make postscript output)
lpr pdbout.ps (or a similar command, to print the postscript file)
A maximum of 100 lines will be given in any table. If more than 100
problems should be listed, the table is truncated at 50 lines, and the
total number of lines is written at the bottom. Since most tables are
sorted such that the worst numbers are at the top, this should not be
a problem. If you want to see the whole list anyway, you can get it
sometimes by running the individual check while creating a logfile
(see DOLOG), or in any case by setting WHAT IF parameter 593 (the
limit of the number of lines in a table) to a higher value (see
SETWIF).
FULCHK is a terminal option. That means that you can not run FULCHK
just in the middle of a WHAT IF session. You run FULCHK on one
molecule, preferably in a "fresh" WHAT IF. After FULCHK finished, you
are immediately asked to terminate the session with FULLSTOP.
The command FSTCHK does the same as the FULCHK option. However, rather
than running all checks, only a subset of all checks is executed. You
can control which options are skipped and which are executed with the
TODO.CHK file (of which there is an example in your dbdata directory
of the WHAT IF account). In this file the first three characters of
each line are the Check-Id, and columns 4-6 are either 'YES' or
'NO'. The rest of each line is free; in the example file you can find
out what the check does and how long it normally takes.
Most checking options write a summary in a file that can be inspected
by for example a simple perl-script like used in our WWW version of
the CHECK procedures. The file is called 'check.db'. WHAT IF keeps
adding its results to the end of this file. The command NEWCHK closes
the old copy of this file if it exists. It also closes any TEX files
that were made already. If you want to keep those files you should
rename them BEFORE you run any other check option, because the check
options will not even hesitate for a millisecond, and overwrite the
old files.
This check verifies the chain names in the PDB file. All residues with
a certain chain name should be consecutive in the file, otherwise an
error message will be given. Another error message will be given if any
residue has a lower number than the previous residue in the same chain.
The Coordinate rounding check looks for "0"s at the end of the
coordinates in the PDB File. If there are many atoms with round
coordinates (upto 0.1A) this probably means the structure (or a
subset) was not refined at all.
The command NAMCHK alolows you to check the names of atoms. All atoms
with non-IUPAC names will be listed. This involves simple torsion angle
calculations (like for the PHE side chain) as well as checks for the
exchange of atoms (like CG and OG in the THR side chain).
WGTCHK checks whether all atomic occupancies are between 0 and 1.
XBFCHK verifies the B-factors in the structure. If many buried atoms
have a B factor below 5.0, a warning is given. This either means that
the structure has been determined at low temperature, or that there
are problems in the refinement. If the average B factor for buried
atoms is very high or very low, another warning is given. Finally, the
distribution of B factors (basically the differences between B factors
of bonded atoms) is analyzed. If the result is very strange, a warning
is printed. If this warning appears, the B-factors should probably be
constrained during the refinement. Because these strange observed
differences can not be caused by thermal motion, adding constraints
could improve the behaviour of the refinement.
The command AXACHK will verify for each atom in the structure whether
it has a distance bigger than 0.7 Angstrom to all proper symmetry
axes. Any atom coming closer than this distance must form a "bump" to
a symmetry related copy of itself. The only exception is a water
molecule that is exactly on an axis; therefore WHAT IF will not
complain in such a case.
The option H2OCHK will perform three checks on all water molecules
in the soup.
For all clusters of water molecules H2OCHK will verify whether they
are free-floating in the unit-cell, or touch the protein somewhere.
If a cluster is free-floating this is reported as a problem: it is
very unlikely that such clusters can be seen in the X-ray density, so
the listed water molecules are probably refinement artefacts.
For all water molecules the closest protein molecule is located. If
this is a molecule that is symmetry related to the ones given in the
input file, a warning is given. For optimum usability of the file the
listed waters should be moved such that they are closest to the
untransformed protein molecule. See the MOVWAT option for this.
For all water molecules all possible Hydrogen bonding partners are located.
All water molecules that lack the possibility to form any hydrogen bond
are listed.
The volume of the unit cell for a normal protein structure is about
1.8-4.5 cubic Angstrom per Dalton. This check will calculate this
so-called Matthews' coefficient for the current structure, and
complain if it is outside these boundaries. Most of the time, this
check triggers because "Z" (multiplicity of the unit cell) is given
incorrectly on the CRYST1 card of the PDB file.
Similar protein molecules should be folded in similar ways. Within one
protein structure that means that if there are two or more identical
molecules in the asymmetric unit, these should be similar. Normally
this is ensured by using Non-crystallographic symmetry
constraints. The NCS check in WHAT IF will generate 2 plots (if they
are potentially interesting) for each pair of "identical" molecules,
such that you can judge whether you think they are indeed identical in
structure. No automatic interpretation (yet?).
The command SYMCHK is a killer command. That means that it starts by
wiping out the soup. It will then prompt you for the name of the PDB
file for which the symmetry information should be checked. This
file will be read and checked.
This option checks the internal consistency of the SCALE and CRYST
card in the PDB file, and it checks if the crystal can be
reconstructed from the atomic coordinates and the provided symmetry
information. It also checks whether the cell complies with rules set
by the IUCr, and whether there is extra symmetry between so-called
independent molecules.
The bond angle check will compare each bond angle in protein residues
with the Engh and Huber distance parameters [See Engh and Huber, Acta
Cryst. A47, 392-400 (1991)] and print a table of all bond angles that
differ by more than 4 standard deviations from the expected values.
It will also calculate an RMS Z-score for all angles, telling you
how well the bond angles in general have been restrained.
The command BNDCHK does not require any additional input. It will
perform a number of checks on the chemical bonds in the structure.
First it will check whether all atoms in all protein and nucleic acid
residues are present.
After that it will compare each bond in protein residues with the Engh
and Huber distance parameters [See Engh and Huber, Acta Cryst. A47,
392-400 (1991)] and print a table of all bonds that differ by more than
4 standard deviations from the expected values.
As a third check, the RMS deviation from the mean Engh and Huber
parameters is determined (expressed in standard deviations). This RMS
value is expected to be around 1.0. If it is bigger than 1.5 or smaller
than 0.666 WHAT IF will complain.
Lastly, BNDCHK will determine whether the deviation from the Engh and
Huber bondlengths is significantly correlated with the direction of
the bond in the crystallographic unit cell. If such a correlation is
found, a new unit cell is calculated where the correlation is gone.
If this message appears, the cell used during refinement probably
is not accurate enough. We do not have any experience on what to
do about it, though.....
The command CHICHK is equivalent to the EVACHI command in the CHIANG
menu.
All torsion angles in the molecule will be compared with the
distribution of the same torsion angle in 150 of the 300 best refined
proteins from the PDB. You will get a score for 'normality' and not
for 'correctness' or energetics. In this score 0.0 means that this
torsion angle value is as normal as it can be, and negative values
represent less common conformations. Residue values below -2.0 warrant
investigation, below -3.0 something strange must be happening.
For this analysis all torsion angles in the residue except omega are
used.
Another part of the CHICHK verifies the phi/psi combination versus a
Ramachandran plot. Residues that are in forbidden areas of the
Ramachandran plot will be listed. Also, a separate check on omega
values will be performed (for PRO and non-PRO residues), and residues with
unusual values are listed.
The Omega angle is often fairly strictly restrained to 180 degrees. That
is not good: there should be a little flexibility allowed. This check
verifies that the variation in omega angles observed in the protein
is "normal".
The planarity of side chains of protein residues is verified against a
database distribution. If any side chain deviates more than 4.0
standard deviations from planarity, this fact is reported. For this check
any hydrogen atoms are ignored (but see PL3CHK).
For each atom connected to an aromatic ring system the distance of the
atom to the least squares plane of the ring is calculated, and
compared with a database distribution. If any value deviates more than
3.0 standard deviations from the plane, this fact is reported.
We do not have "normal planarity" data for DNA/RNA bases and protein
side chains with hydrogen atoms. In the PL3CHK, the planarity of these
systems is calculated, and a complaint is expressed if the RMS deviation
is larger than 0.1 Angstrom.
Proline rings are not flat. They have a fairly precise puckering
pattern. This check verifies whether all proline residues in the
structure look normal. Since there is no hard data, the results are
a bit subjective.
In a phi/psi plot, all residues are normally in some areas at the left
side. If you look more carefully, some residues are in more tight
areas than others. WHAT IF has 60 different "normal" areas for all 20
residue types and 3 secondary structure types. Instead of plotting all
60 different plots, it will calculate a Z-score for you that can tell
you whether the Ramachandran plot "looks OK". That Z-score is
calculated by the RAMCHK option.
The command HNDCHK can be used to check for wrong handedness of chiral
atoms in the twenty natural ocurring residues. All atoms with
deviating chirality will be listed. Not only the "opposite" chirality
will be detected, but also "too flat" or "too puckered" atoms.
The backbone conformation check will check whether there are fragments
of 5 C-alpha coordinates that are not represented by other structures in
the WHAT IF database. It will calculate an overall normality Z-score
that expresses this.
The Chi-1/Chi-2 correlation check verifies for each residue whether
the chi-1 and chi-2 angle combination is normal for this residue type
in this secondary structure element. A final Z-score will be calculated
that expresses how normal the chi-1 and chi-2 angles in this structure are.
The command FLPCHK causes WHAT IF to compare all local backbone
conformations (5 residue stretches) with similar (RMSD on alpha
carbons less that 0.5 Angstrom) conformations in the database. The
RMSD of the backbone oxygen in the structure and the database
positions is given. If this value for a residue is above 1.5 manual
inspection of the peptide plane seems advisable. In brackets the
number of hits in the database is listed. This number should normally
be 80, as that is the maximal number of hits WHAT IF looks for. If
this number is considerably less than 80, the RMSD value for the
oxygen position becomes a less sensitive measure of quality.
The command ROTCHK will compare for all residues their chi-1 rotamer
with the distribution of observed rotamers for the same residue type
in a similar local backbone conformation in the database. A normality
index will be listed. If this index is lower than 0.5 a warning will
be given. A few values are expected to appear for every structure, but
normality values lower than 0.2 should occur only extremely sparingly!
The command BMPCHK activates a bump check that is rather different from
the bump functions used by e.g. the DEBUMP option.
From a study of WHAT IF's database of high quality structures it was
determined that no pair of non-hydrogen-bonded atoms should have an
inter-atomic distance more than 0.4 Angstrom shorter than the sum of
the two Van der Waals radii. For hydrogen bonded atoms this limit was
found to be 0.55 Angstrom.
In the BMPCHK, all interatomic distances between non-bonded atoms are
calculated, and verified against these rules. If two atoms do come
closer, the amount by which the contact is too short is printed in a
table. In the table it will be indicated whether the bump is between
symmetry relatives (inter) or within the given asymmetric unit
(intra).
A bump will never be reported between two atoms for which the sum
of their atomic occupancies is less than 1.0
The command QUACHK is similar to the OLDQUA option in the QUALITY
menu. It activates the packing quality control. See the chapter on
QUALITY control for an explanation. For short:
Every residue with a quality value below -5.0 is suspicious. A sequence
of residues with low quality scores is "interesting".
Every molecule with a global quality below -2.7 is guaranteed wrong. A
molecule with a quality below -2.0 might be misfolded or poorly
refined. Every molecule with a global quality below -1.2 does not
belong in a database of reliable structures.
The command QUACHK is similar to the NEWQUA option in the QUALITY
menu. It activates the second generation packing quality control. See
the chapter on QUALITY control for an explanation. For short:
Every residue with an "all-all" quality value below -2.5 is
suspicious. A sequence of residues with low quality scores is
"interesting".
Every molecule with a quality Z-score for all atoms below -5.0 is
guaranteed wrong. A molecule with a quality below -3.0 might be
misfolded or poorly refined. Every molecule with a global quality
below -2.0 does not belong in a database of reliable structures.
Most PHE residues are expected to be buried. Most LYS residues are
expected to be exposed. The INOCHK tests whether this protein is "normal"
in this aspect. It will report a normality RMS Z-score for the whole
structure. Inside-out structures, membrane proteins and misthreaded
structures will trigger this check.
The way it works: For each residue the accessibility is
calculated. These values are divided by the "vacuum accessibility" of
the residue type, resulting in an "accessibility fraction". These
numbers are now sorted from low to high. We then expect PHE residues
to appear in the beginning of the array. Using mean and
standarddeviation for the location in the array from the WHAT IF
database, a Z-score is calculated for each residue. (At this moment,
the average location of PHE is 0.301 into the array, with a
standarddeviation of 0.197, For GLU these values are 0.593 and 0.209,
showing a much higher tendency to be relatively outside). The Z-scores
for the residues are used to calculate an RMS Z-score for the
structure.
Due to the way this check works, for NMR ensembles the result is slightly
different if a model is examined alone or in the context of other models.
We think the result in the context of other models might actually be a more
accurate result, so we will not ``fix'' this.
The command ACCCHK will calculate and evaluate accessible surfaces. It
will indicate whether the distribution of polar and apolar accessible
and buried atoms looks normal or not. At present I am not sure yet how
to interpret the numbers.... This option is not used for the FULCHK report.
Please see INOCHK for an alternative where a better interpretation exists.
The command BPOCHK will cause WHAT IF to list all buried unsatisfied
hydrogen bond donors or acceptors. This check uses a very
straightforward definition of a hydrogen bond. A more sophisticated
check of unsatisfied hydrogen bond potential is part of the HNQCHK.
HNQCHK performs a set of commands from the HBONDS menu in a row, having to
do with the HB2 options. For this a complete calculation is done of the
optimal hydrogen bond network in the protein. A number of warnings can
be generated from the result.
The optimization of the hydrogen bond network considers two
possibilities for the side-chain conformations of HIS, ASN and GLN
residues. The X-ray experiment can not see the difference between the
two conformations. If the orientation of the side chain of one of
these residues in the optimized H-bond network is different from the
orientation in the input file, a warning is given.
If any buried hydrogen bond donors do not have an acceptor, they are
listed. In high resolution structures these do not occur, because it
is energetically highly unfavourable!
If any polar side chain acceptor does not accept a hydrogen bond, the
atom is listed.
From the optimized hydrogen bond network the protonation state of the
HIS residues (HISD, HISE or HISH) can be deduced. Also, from the
geometry of the HIS ring it is often possible to see which Engh and
Huber parameters have been used for refinement. All these assignments
are printed in a table. If the two assignments for a residue differ it
is good to verify whether the correct parameters have been used for
the refinement.