Difference: ShiftSparta (1 vs. 10)

Revision 1016 Mar 2009 - Main.DavidCowburn

 

SPARTA: Shifts Predicted from Analogy in Residue type and Torsion Angle – NYSBC notes

As described in the paper:

Protein backbone chemical shifts predicted from searching a database for torsion angle and sequence homology
Yang Shen and Ad Bax
LIBRARY:ShenBax08.pdf

Local install – dl380://infotech/spartainstallPC

cygwin session ...
Script started on Mon Feb 4 12:21:52 2008

Changed:
<
<
>>Administrator@cowburn-pc
>
>
&
 #[33m/cygdrive/d/spartainstall/SPARTA
./src/sparta in test/ubiquitin.pdb

Reading PDB Coordinates from test/ubiquitin.pdb
Reading Random Coil Shifts from .\tab\randcoil.tab

Reading RC Adjustments from .\tab\rcadj.tab
Reading Previous Residue RC Adjustments from .\tab\rcprev.tab
Reading Next Residue RC Adjustments from .\tab\rcnext.tab
Reading Weighting Factors from .\tab\weight.tab
Reading Residue Homology Table from .\tab\homology.tab
Reading Fitting Parameter Table from .\tab\fitting.tab
Reading .\tab\sparta.tab, 24166 Triplets
Can't save file pred\test/ubiquitin_in.tab

Analyzing test/ubiquitin.pdb 76 residues read
Predicting ...
N HA C CA CB H
124.353 5.462 175.920 55.080 30.759 8.947 2 Q test/ubiquitin
116.472 4.213 172.450 59.570 42.210 8.342 3 I test/ubiquitin
119.243 5.693 175.320 55.210 41.480 8.871 4 F test/ubiquitin

122.133 4.870 174.870 60.621 34.230 9.693 5 V test/ubiquitin
128.653 5.367 177.140 54.519 35.050 9.096 6 K test/ubiquitin
116.533 4.970 176.909 60.470 70.630 8.925 7 T test/ubiquitin
122.463 4.310 178.800 57.580 41.970 9.037 8 L test/ubiquitin
106.723 4.428 175.520 61.400 69.140 7.386 9 T test/ubiquitin
110.023 3.978 174.070 45.460 9999.000 7.522 10 G test/ubiquitin

122.734 4.361 175.940 56.280 33.200 6.915 11 K test/ubiquitin
121.573 5.264 174.320 62.390 69.910 8.627 12 T test/ubiquitin
128.243 4.545 175.220 59.980 40.950 9.852 13 I test/ubiquitin
122.653 5.067 173.789 61.940 69.650 8.696 14 T test/ubiquitin
125.933 4.779 174.670 52.830 47.070 8.760 15 L test/ubiquitin
123.293 5.045 175.860 54.820 29.450 8.177 16 E test/ubiquitin

118.342 4.713 174.160 58.431 36.400 9.226 17 V test/ubiquitin
120.123 5.078 176.161 52.720 30.310 8.723 18 E test/ubiquitin
139.146 4.141 175.310 65.470 31.950 9999.000 19 P test/ubiquitin
104.533 4.370 174.660 57.400 63.370 7.137 20 S test/ubiquitin
124.613 4.695 176.360 55.700 40.800 8.351 21 D test/ubiquitin
109.934 5.147 176.750 59.690 71.200 7.948 22 T test/ubiquitin

122.323 3.657 179.040 62.260 34.350 8.688 23 I test/ubiquitin
121.963 3.917 178.640 60.220 28.280 9.795 24 E test/ubiquitin
121.703 4.525 178.379 56.060 38.449 7.723 25 N test/ubiquitin
122.843 3.397 177.950 67.660 30.840 7.978 26 V test/ubiquitin
119.993 4.648 180.550 59.249 33.730 8.617 27 K test/ubiquitin
124.573 4.161 180.300 55.370 17.710 7.904 28 A test/ubiquitin

121.073 4.207 180.320 59.650 33.290 7.933 29 K test/ubiquitin
122.213 3.507 178.310 66.150 36.800 8.326 30 I test/ubiquitin
124.623 3.829 178.890 60.000 27.720 8.622 31 Q test/ubiquitin
120.493 4.354 177.250 57.190 40.580 8.231 32 D test/ubiquitin
116.263 4.337 177.870 58.050 34.170 7.521 33 K test/ubiquitin
115.003 4.625 177.840 55.170 32.661 8.995 34 E test/ubiquitin

109.782 4.035 173.960 46.080 9999.000 8.741 35 G test/ubiquitin
121.013 4.446 173.590 57.750 40.580 6.297 36 I test/ubiquitin
142.438 4.634 176.940 61.660 31.850 9999.000 37 P test/ubiquitin
139.608 4.117 178.320 66.260 32.890 9999.000 38 P test/ubiquitin
114.512 4.430 177.090 55.640 39.540 8.617 39 D test/ubiquitin
117.913 4.583 175.381 55.640 30.140 7.924 40 Q test/ubiquitin

118.853 4.244 176.300 56.470 31.650 7.307 41 Q test/ubiquitin
123.813 4.499 174.050 55.050 31.750 8.520 42 R test/ubiquitin
125.173 5.353 175.290 52.980 45.790 8.867 43 L test/ubiquitin
123.053 5.216 176.060 58.980 41.420 9.487 44 I test/ubiquitin
126.523 5.045 174.470 57.020 43.760 8.869 45 F test/ubiquitin
133.333 3.690 177.289 52.540 16.570 8.897 46 A test/ubiquitin

103.473 3.791 173.810 45.350 9999.000 8.087 47 G test/ubiquitin
122.702 4.623 174.700 54.550 34.530 8.284 48 K test/ubiquitin
123.543 4.666 175.670 55.740 29.000 8.667 49 Q test/ubiquitin
126.653 4.090 176.659 54.240 41.570 8.872 50 L test/ubiquitin
124.073 4.488 175.870 55.960 31.570 8.442 51 E test/ubiquitin
121.163 4.360 177.330 56.959 40.850 8.187 52 D test/ubiquitin

107.793 4.045 174.870 45.170 9999.000 9.567 53 G test/ubiquitin
120.183 4.695 175.350 54.390 32.650 7.288 54 R test/ubiquitin
109.533 5.508 176.560 59.690 72.260 8.799 55 T test/ubiquitin
119.053 4.060 180.810 58.710 40.370 8.176 56 L test/ubiquitin
114.463 4.370 178.310 61.080 62.530 8.585 57 S test/ubiquitin
125.323 4.296 177.400 57.180 40.100 7.654 58 D test/ubiquitin

116.642 4.670 174.700 58.250 40.070 7.124 59 Y test/ubiquitin
117.033 4.355 174.341 54.120 37.410 8.329 60 N test/ubiquitin
119.733 3.393 174.610 62.420 36.740 6.970 61 I test/ubiquitin
125.874 4.506 175.970 53.660 31.650 7.647 62 Q test/ubiquitin
121.433 4.001 175.810 57.791 32.649 8.487 63 K test/ubiquitin
115.083 3.465 175.250 57.890 25.900 9.591 64 E test/ubiquitin

115.863 4.640 172.160 60.890 64.910 7.383 65 S test/ubiquitin
118.242 5.614 173.950 62.340 70.080 8.737 66 T test/ubiquitin
128.243 5.060 175.770 53.900 44.260 9.801 67 L test/ubiquitin
119.513 5.292 173.150 55.000 30.531 9.633 68 H test/ubiquitin
125.592 5.282 175.270 53.890 44.380 8.533 69 L test/ubiquitin
128.073 4.351 173.999 60.800 34.910 9.490 70 V test/ubiquitin

124.262 5.361 177.830 53.940 42.851 8.067 71 L test/ubiquitin
124.244 4.921 174.953 54.777 32.225 9.169 72 R test/ubiquitin
128.176 4.628 176.270 54.090 42.511 8.881 73 L test/ubiquitin
124.343 4.706 175.048 54.919 31.176 8.588 74 R test/ubiquitin
112.599 4.156 173.001 44.721 9999.000 8.348 75 G test/ubiquitin

Running time: 20.343 seconds
>>/cygdrive/d/spartainstall/SPARTA#

Using a standard MS shell with the infotech drive mounted ..

src\sparta -in test\ubiquitin.pdb ...

Original text --

Contact: shenyang@niddk.nih.gov; bax@nih.gov Web: http://spin.niddk.nih.gov/bax


DOWNLOAD

[[http://spin.niddk.nih.gov/bax/software/SPARTA/sparta.linux.tar.Z][RedHat Linux /Fedora Core version ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/sparta.win32.zip][Win32 version]]

The download unix archive can be unpacked with a command like the following:

   zcat sparta.linux.tar.Z | tar xvf -

The win32 archive can be unpacked with a traditional Windows zip software.

Users are encouraged to email the author to be informed about updates and related software.


[[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#what%20is%20sparta][ *What is SPARTA?* ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#reliability][ *Reliability of SPARTA* ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#components][ *Components of the SPARTA Package* ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#how%20to%20use][ *How to Use SPARTA* ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#PDB%20coordinates][ *Preparing the PDB Coordinates* ]]

[[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#add%20new%20proteins][ *Adding New Proteins to the Database* ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#compile][ *Compile the Source Code* ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#about%20name][ *About the Name SPARTA* ]]


What

is SPARTA?

SPARTA is a database system for empirical prediction of backbone chemical shifts (N, HN, HA, CA, CB, CO) using a combination of backbone phi, psi torsion angles and sidechain chi1 angles from a given protein with known PDB coordinates. The SPARTA approach is an extension of the well-known observation that many kinds of secondary chemical shifts (i.e. differences between chemical shifts and their corresponding random coil values) are highly correlated with aspects of protein secondary structure. The goal of SPARTA is to use phi, psi, chi1 torsion angles and sequence information from proteins structure in order to make quantitative predictions for the backbone chemical shifts

SPARTA uses the phi, psi and chi1 angles of a given residue to predict secondary shifts for that residue. SPARTA also includes the information from the next and previous residues when making predictions for a given residue. So, in practice, SPARTA uses data for three consecutive residues simultaneously (i.e. 9 torsion angles and 3 residue types) to make predictions for the central residue in a triplet.

The idea behind SPARTA is that if one can Trash.findDFdf some triplet of residues in a protein of known structure with similar structure and sequence to a triplet in a target protein, then the backbone secondary chemical shifts for this protein will be useful predictors for the backbone secondary chemical shifts in the target.

The similarity is measured with a score based on the weighted sum of squares differences between the torsion angles in the target protein and the database entries, so that lower scores indicated high similarity. In order to take advantage of the correlations between residue type and secondary structure, the score also includes a small, qualitative residue type term which biases the matching towards roughly similar sequences.

In practice, SPARTA searches a database for the 20 best matches to a given triplet in the target protein. The weighted averages chemical chemical shifts (obtained by subtracting their corresponding random coil chemical shifts values and the adjustments values arising from the effects of neighboring residues) of the central residues of these 20 matches are used as a prediction for the secondary shift of the central residue. The SPARTA database was constructed using the most well-defined parts of high resolution (2.4 Angstroms or better) X-ray crystal structures to define the phi, psi and chi1 angles, as well as other structural information, such as hydrogen bonding and ring current shifts, which would be used to quantitatively correct the raw predicted shifts from database searching. This database currently includes data from 200 proteins, representing 24,166 triplets.


Reliability

of SPARTA

The reliability of the SPARTA approach was tested by a cross-validation procedure where each protein was temporarily removed from the database, and its backbone chemical shifts (N, HN, HA, CA, CB and C’) were predicted using the remaining protein data. The RMS deviations between the predicted and experimental shifts are 2.36, 0.46, 0.25, 0.88, 0.97 and 1.01 ppm, respectively. The same shifts prediction accuracies are also obtained for the proteins with known structures which are not contained in the database.

Importantly, it is also found in the test that the standard deviation the shifts from the central residues of the 20 matches are correlated with the shifts prediction errors. By checking the standard deviations in the prediction summary file (pred/pred.tab) will provide an idea of the prediction reliability.

It should be noted that the global structural information, such as ring current shifts and hydrogen bonding, was also carefully considered in SPARTA. The secondary shifts in SPARTA database are actually the corrected shifts using the ring current shifts. As “compensation”, the SPARTA predicted shifts for target protein are also corrected by adding the calculated ring current shifts from target protein. For HA and HN, the predicted secondary shifts are also corrected by using the hydrogen bond length and their relationship with the prediction errors, which were derived from above cross-validation. Therefore, the accuracy of the coordinates of the target protein is critical to obtain the reliable hydrogen bond information and ring current shifts, and the final predicted shifts. The calculated hydrogen bond and ring current shifts information is stored in the input summary file (/pred/protein_in.tab).

It should also be noted that the protein backbone chemical shifts are extremely sensitive to the local conformation; therefore, SPARTA results for the residues in the flexible region or the with very large ring current shifts contribution may be less reliable, which was also indicated by the test.


Components

of the SPARTA Package

The SPARTA system is implemented using C++. The compiled executable files ($SPARTA_DIR/src/SPARTA for Linux, $SPARTA_DIR/src/SPARTA.exe for Windows) or starting script ("$SPARTA_DIR/sparta" for Linux) can be invoked with "TALOS-like" command-line argument. A complete list of options can be invoked and generated with a "-help"

command-line argument or simply typing in the executive files or starting script without any command-line arguments.

Running SPARTA requires definition of the environment variables " SPARTA_DIR "; this will be established automatically by the starting script ("$SPARTA_DIR/sparta" in Linux):

setenv SPARTA_DIR /disk1/SPARTA

$SPARTA_DIR/src/SPARTA $argv[1-$#argv]

Note that the default "$SPARTA_DIR" is the current directory if not specified.

Other files of the SPARTA package include:

$SPARTA_DIR/tab/sparta.tab The compiled database of residue triplets with their corresponding PHI/PSI/CHI1 angles and secondary shifts.

*$SPARTA_DIR/tab/randcoil.tab, rcadj.tab, rcprev.tab, rcnext.tab* The table of random coil shifts, adjustments values from neighboring residues used in the shifts prediction process. (The same tables as used in TALOS, http://spin.niddk.nih.gov/NMRPipe/talos/)
$SPARTA_DIR/tab/homology.tab The residue type homology factors used in the prediction process, which is similar to the table used by TALOS.
$SPARTA_DIR.tab/weight.tab The weighting factors of PHI, PSI and CHI1 angles, and residue type homology used in the prediction process.
$SPARTA_DIR.tab/fitting.tab The fitting parameters between prediction accuracy and precision, which will be used after the prediction process to calculate the estimated prediction error.

$SPARTA_DIR/shifts/.tab* The files in this directory are only used when compiling a new database. When compiling a new database, only shift tables ending with the ".tab" extension will be used. The files in this directory are the chemical shift tables for the proteins in the database, which are in the same format as the TALOS shifts tables and must be exactly consistent with the corresponding structures in the SPARTA pdb directory.
$SPARTA_DIR/pdb/.pdb* The PDB coordinates files in this directory are only used along with the files in the SPARTA shifts directory when compiling a new database (e.g. adding new proteins into the database). The sequence and residue numbering must be exactly consistent with the corresponding assignments in the SPARTA shifts directory. Furthermore, the names of these files must be exactly consistent with the corresponding chemical shift tables in the SPARTA shifts directory.
$SPARTA_DIR/test/* The contents of this "test" directory include the input files and results for a sample SPARTA analysis.


How to Use

SPARTA

Use of SPARTA to predict backbone chemical shifts involves the following steps:

  1. Create a directory for the prediction session; all subsequent commands will be executed from this directory.

  2. Prepare the input PDB coordinate file (for example "protein.pdb"), according to the format given above.

  3. Run SPARTA ("$SPARTA_DIR/src/SPARTA" or "$SPARTA_DIR/sparta" in Linux, "$SPARTA_DIR/src/SPARTA.exe" in Windows) to perform the database searches. Most commonly, this will simply require a command such as:

    sparta -in protein.pdb

    SPARTA will first generate an input "pred/protein_in.tab" file from PDB coordinates, which contains of the phi, psi, chi1 angles, H-bonding information and ring current shifts. During the database search, a series of files "pred/X/res*.tab" (X = N, H, HA, CA, CB and C) will be created. Each one of these files tallies the 20 best database matches for a given residue in the target protein. Before exiting, a file "pred.tab" will also be created in "pred" directory, which includes a summary of the prediction results. The database search will typically take about 25 sec for a 100-residue protein on a Linux PC with a 2.8GHz CPU.

  4. If experimental chemical shifts for target protein are available (with a name "ref.tab", for example, and the same format as typical TALOS shift table file, http://spin.niddk.nih.gov/NMRPipe/talos/), SAELDI prediction can be performed by a command such as:

    sparta -in protein.pdb -ref ref.tab

    SPARTA would compare the predicted chemical shifts and experimental shifts before exiting, and a prediction summary file "pred/pred.tab" will be generated to store the comparison between the reference and predicted shifts, as well as the errors. If the average prediction error larger than 3 times of the expected errors (standard deviation of prediction errors / square root of number of shifts), a warning is printed and a reference correction will be applied to the experimental chemical shifts. The corrected reference chemical shifts are stored into a new file "pred/ref.tab"


Preparing

the Input PDB Coordinates

The input PDB coordinates should be prepared carefully, so that it has the proper format, naming conventions. SPARTA accept the standard PDB coordinates file, but ONLY the FIRST conformer/chain if more than one exist. For PDB coordinates without hydrogen atoms, the hydrogen atoms are required to be added (by using programs DYNAMO, REDUCE, MOLMOL, or any other similar programs) in order to get the hydrogen bonding information and ring current shifts. For HA atoms of Gly, please use atom names of "HA1/HA2"

Examples of the required PDB coordinate format can be found in the "$SPARTA_DIR/pdb" and

"$SPARTA_DIR/test" directories.


Adding

New Proteins to the Database

New protein chemical shift and structure data can be added to the database. Note well that this should be done with great care and caution, to ensure that only reliable phi/psi/chi1 data with consistently referenced and correct chemical shifts are included. It suggests that

  1. The chemical shifts assignments for each candidate protein are better validated by conducting a SPARTA shift prediction using its PDB coordinates.

    sparta -in protein.pdb -ref ref.tab
  2. Check the prediction summary table (pred/pred.tab) files, remove the experimental shifts for which the predicted shifts deviated five standard deviations. Notably, HAs, for which ring current shifts are > 1.5ppm and the predicted shifts deviate the three standard deviations, are better removed.

  3. Chemical shifts shoule be referenced correctly. A quick check can be conduct by runing above SPARTA prediction for this protein and inspecting the average SPARTA prediction errors, which are listed in the header of prediction summary table (pred/pred.tab). By default, SPARTA will apply a shift referencing correction if the average prediction error is larger than 3 times expected error (i.e., standard deviation of prediction errors / square root of number of shifts), and store the corrected shifts in a file "pred/ref.tab"

Given this, the procedure for adding new proteins to the SPARTA database is simple as:

  1. Create a chemical shift table for the new protein according to the TALOS format (http://spin.niddk.nih.gov/NMRPipe/talos/). Copy the table to the "$SPARTA_DIR/shifts" directory; it must have a ".tab" extension in order to be used.

  2. Place the corresponding PDB structure file into the "$SPARTA_DIR/pdb" directory; it must have a ".pdb" extension, and its file name, sequence, and residue numbering must correspond exactly with the shift table.

  3. Prepare a table file, for example with a name of "list.tab", which only contains the names of proteins to be added into the database. This table must follow the example below:

    VARS   PDB_NAME FORMAT %24s bpti ubiquitin profilin ...

    Note that the "PDB_NAME" in the table file must consistent with the files names (with ".tab"

    and ".pdb" extension) in the SPARTA pdb and shifts directories.

  4. In the "SPARTA" directory, execute the following command to compile a new database:

    sparta -compile -pdbDir ./pdb -pdbList list.tab
  5. A new database "$SPARTA_DIR/tab/sparta.tab" will be generated from the files in SPARTA pdb and shifts directories. Please backup the old database, which will be overwritten.


Compile the

Source Code

SPARTA was implemented with standard C++ using Standard Template Library (STL). To compile the source codes (in /src directory), your system must have a compatible C++ compiler and STL library. Given this, the compiling of SPARTA executable file is simple as:

cd $SPARTA_DIR/src
make

The compiling of the SPARTA program has been tested on Windows (XP) and Linux (Linux 9 or newer). The compiled executable files ("$SPARTA_DIR/src/SPARTA" for Linux, or "$SPARTA_DIR/src/SPARTA.exe" for Windows) are contained in the distributed SPARTA package.


About the

Name SPARTA

o

In antiquity Sparta was a Dorian Greek military state, originally centered in Laconia. As a city-state devoted to military training, Sparta possessed the most formidable army in the Greek world and regarded itself as the natural protector of Greece.


_[ Home ] [ NIH ] [ NIDDK ] [ Disclaimer ] [ Copyright ]_

_last updated: Apr 2007 / Webmaster_

Revision 917 Sep 2008 - Main.DavidCowburn

 

SPARTA: Shifts Predicted from Analogy in Residue type and Torsion Angle – NYSBC notes

As described in the paper:

Protein backbone chemical shifts predicted from searching a database for torsion angle and sequence homology
Yang Shen and Ad Bax
LIBRARY:ShenBax08.pdf

Local install – dl380://infotech/spartainstallPC

cygwin session ...
Script started on Mon Feb 4 12:21:52 2008

>>Administrator@cowburn-pc #[33m/cygdrive/d/spartainstall/SPARTA
./src/sparta in test/ubiquitin.pdb

Reading PDB Coordinates from test/ubiquitin.pdb
Reading Random Coil Shifts from .\tab\randcoil.tab

Reading RC Adjustments from .\tab\rcadj.tab
Reading Previous Residue RC Adjustments from .\tab\rcprev.tab
Reading Next Residue RC Adjustments from .\tab\rcnext.tab
Reading Weighting Factors from .\tab\weight.tab
Reading Residue Homology Table from .\tab\homology.tab
Reading Fitting Parameter Table from .\tab\fitting.tab
Reading .\tab\sparta.tab, 24166 Triplets
Can't save file pred\test/ubiquitin_in.tab

Analyzing test/ubiquitin.pdb 76 residues read
Predicting ...
N HA C CA CB H
124.353 5.462 175.920 55.080 30.759 8.947 2 Q test/ubiquitin
116.472 4.213 172.450 59.570 42.210 8.342 3 I test/ubiquitin
119.243 5.693 175.320 55.210 41.480 8.871 4 F test/ubiquitin

122.133 4.870 174.870 60.621 34.230 9.693 5 V test/ubiquitin
128.653 5.367 177.140 54.519 35.050 9.096 6 K test/ubiquitin
116.533 4.970 176.909 60.470 70.630 8.925 7 T test/ubiquitin
122.463 4.310 178.800 57.580 41.970 9.037 8 L test/ubiquitin
106.723 4.428 175.520 61.400 69.140 7.386 9 T test/ubiquitin
110.023 3.978 174.070 45.460 9999.000 7.522 10 G test/ubiquitin

122.734 4.361 175.940 56.280 33.200 6.915 11 K test/ubiquitin
121.573 5.264 174.320 62.390 69.910 8.627 12 T test/ubiquitin
128.243 4.545 175.220 59.980 40.950 9.852 13 I test/ubiquitin
122.653 5.067 173.789 61.940 69.650 8.696 14 T test/ubiquitin
125.933 4.779 174.670 52.830 47.070 8.760 15 L test/ubiquitin
123.293 5.045 175.860 54.820 29.450 8.177 16 E test/ubiquitin

118.342 4.713 174.160 58.431 36.400 9.226 17 V test/ubiquitin
120.123 5.078 176.161 52.720 30.310 8.723 18 E test/ubiquitin
139.146 4.141 175.310 65.470 31.950 9999.000 19 P test/ubiquitin
104.533 4.370 174.660 57.400 63.370 7.137 20 S test/ubiquitin
124.613 4.695 176.360 55.700 40.800 8.351 21 D test/ubiquitin
109.934 5.147 176.750 59.690 71.200 7.948 22 T test/ubiquitin

122.323 3.657 179.040 62.260 34.350 8.688 23 I test/ubiquitin
121.963 3.917 178.640 60.220 28.280 9.795 24 E test/ubiquitin
121.703 4.525 178.379 56.060 38.449 7.723 25 N test/ubiquitin
122.843 3.397 177.950 67.660 30.840 7.978 26 V test/ubiquitin
119.993 4.648 180.550 59.249 33.730 8.617 27 K test/ubiquitin
124.573 4.161 180.300 55.370 17.710 7.904 28 A test/ubiquitin

121.073 4.207 180.320 59.650 33.290 7.933 29 K test/ubiquitin
122.213 3.507 178.310 66.150 36.800 8.326 30 I test/ubiquitin
124.623 3.829 178.890 60.000 27.720 8.622 31 Q test/ubiquitin
120.493 4.354 177.250 57.190 40.580 8.231 32 D test/ubiquitin
116.263 4.337 177.870 58.050 34.170 7.521 33 K test/ubiquitin
115.003 4.625 177.840 55.170 32.661 8.995 34 E test/ubiquitin

109.782 4.035 173.960 46.080 9999.000 8.741 35 G test/ubiquitin
121.013 4.446 173.590 57.750 40.580 6.297 36 I test/ubiquitin
142.438 4.634 176.940 61.660 31.850 9999.000 37 P test/ubiquitin
139.608 4.117 178.320 66.260 32.890 9999.000 38 P test/ubiquitin
114.512 4.430 177.090 55.640 39.540 8.617 39 D test/ubiquitin
117.913 4.583 175.381 55.640 30.140 7.924 40 Q test/ubiquitin

118.853 4.244 176.300 56.470 31.650 7.307 41 Q test/ubiquitin
123.813 4.499 174.050 55.050 31.750 8.520 42 R test/ubiquitin
125.173 5.353 175.290 52.980 45.790 8.867 43 L test/ubiquitin
123.053 5.216 176.060 58.980 41.420 9.487 44 I test/ubiquitin
126.523 5.045 174.470 57.020 43.760 8.869 45 F test/ubiquitin
133.333 3.690 177.289 52.540 16.570 8.897 46 A test/ubiquitin

103.473 3.791 173.810 45.350 9999.000 8.087 47 G test/ubiquitin
122.702 4.623 174.700 54.550 34.530 8.284 48 K test/ubiquitin
123.543 4.666 175.670 55.740 29.000 8.667 49 Q test/ubiquitin
126.653 4.090 176.659 54.240 41.570 8.872 50 L test/ubiquitin
124.073 4.488 175.870 55.960 31.570 8.442 51 E test/ubiquitin
121.163 4.360 177.330 56.959 40.850 8.187 52 D test/ubiquitin

107.793 4.045 174.870 45.170 9999.000 9.567 53 G test/ubiquitin
120.183 4.695 175.350 54.390 32.650 7.288 54 R test/ubiquitin
109.533 5.508 176.560 59.690 72.260 8.799 55 T test/ubiquitin
119.053 4.060 180.810 58.710 40.370 8.176 56 L test/ubiquitin
114.463 4.370 178.310 61.080 62.530 8.585 57 S test/ubiquitin
125.323 4.296 177.400 57.180 40.100 7.654 58 D test/ubiquitin

116.642 4.670 174.700 58.250 40.070 7.124 59 Y test/ubiquitin
117.033 4.355 174.341 54.120 37.410 8.329 60 N test/ubiquitin
119.733 3.393 174.610 62.420 36.740 6.970 61 I test/ubiquitin
125.874 4.506 175.970 53.660 31.650 7.647 62 Q test/ubiquitin
121.433 4.001 175.810 57.791 32.649 8.487 63 K test/ubiquitin
115.083 3.465 175.250 57.890 25.900 9.591 64 E test/ubiquitin

115.863 4.640 172.160 60.890 64.910 7.383 65 S test/ubiquitin
118.242 5.614 173.950 62.340 70.080 8.737 66 T test/ubiquitin
128.243 5.060 175.770 53.900 44.260 9.801 67 L test/ubiquitin
119.513 5.292 173.150 55.000 30.531 9.633 68 H test/ubiquitin
125.592 5.282 175.270 53.890 44.380 8.533 69 L test/ubiquitin
128.073 4.351 173.999 60.800 34.910 9.490 70 V test/ubiquitin

124.262 5.361 177.830 53.940 42.851 8.067 71 L test/ubiquitin
124.244 4.921 174.953 54.777 32.225 9.169 72 R test/ubiquitin
128.176 4.628 176.270 54.090 42.511 8.881 73 L test/ubiquitin
124.343 4.706 175.048 54.919 31.176 8.588 74 R test/ubiquitin
112.599 4.156 173.001 44.721 9999.000 8.348 75 G test/ubiquitin

Running time: 20.343 seconds
>>/cygdrive/d/spartainstall/SPARTA#

Using a standard MS shell with the infotech drive mounted ..

src\sparta -in test\ubiquitin.pdb ...

Original text --

Contact: shenyang@niddk.nih.gov; bax@nih.gov Web: http://spin.niddk.nih.gov/bax


DOWNLOAD

[[http://spin.niddk.nih.gov/bax/software/SPARTA/sparta.linux.tar.Z][RedHat Linux /Fedora Core version ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/sparta.win32.zip][Win32 version]]

The download unix archive can be unpacked with a command like the following:

   zcat sparta.linux.tar.Z | tar xvf -

The win32 archive can be unpacked with a traditional Windows zip software.

Users are encouraged to email the author to be informed about updates and related software.


[[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#what%20is%20sparta][ *What is SPARTA?* ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#reliability][ *Reliability of SPARTA* ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#components][ *Components of the SPARTA Package* ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#how%20to%20use][ *How to Use SPARTA* ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#PDB%20coordinates][ *Preparing the PDB Coordinates* ]]

[[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#add%20new%20proteins][ *Adding New Proteins to the Database* ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#compile][ *Compile the Source Code* ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#about%20name][ *About the Name SPARTA* ]]


What

is SPARTA?

SPARTA is a database system for empirical prediction of backbone chemical shifts (N, HN, HA, CA, CB, CO) using a combination of backbone phi, psi torsion angles and sidechain chi1 angles from a given protein with known PDB coordinates. The SPARTA approach is an extension of the well-known observation that many kinds of secondary chemical shifts (i.e. differences between chemical shifts and their corresponding random coil values) are highly correlated with aspects of protein secondary structure. The goal of SPARTA is to use phi, psi, chi1 torsion angles and sequence information from proteins structure in order to make quantitative predictions for the backbone chemical shifts

SPARTA uses the phi, psi and chi1 angles of a given residue to predict secondary shifts for that residue. SPARTA also includes the information from the next and previous residues when making predictions for a given residue. So, in practice, SPARTA uses data for three consecutive residues simultaneously (i.e. 9 torsion angles and 3 residue types) to make predictions for the central residue in a triplet.

The idea behind SPARTA is that if one can Trash.findDFdf some triplet of residues in a protein of known structure with similar structure and sequence to a triplet in a target protein, then the backbone secondary chemical shifts for this protein will be useful predictors for the backbone secondary chemical shifts in the target.

The similarity is measured with a score based on the weighted sum of squares differences between the torsion angles in the target protein and the database entries, so that lower scores indicated high similarity. In order to take advantage of the correlations between residue type and secondary structure, the score also includes a small, qualitative residue type term which biases the matching towards roughly similar sequences.

In practice, SPARTA searches a database for the 20 best matches to a given triplet in the target protein. The weighted averages chemical chemical shifts (obtained by subtracting their corresponding random coil chemical shifts values and the adjustments values arising from the effects of neighboring residues) of the central residues of these 20 matches are used as a prediction for the secondary shift of the central residue. The SPARTA database was constructed using the most well-defined parts of high resolution (2.4 Angstroms or better) X-ray crystal structures to define the phi, psi and chi1 angles, as well as other structural information, such as hydrogen bonding and ring current shifts, which would be used to quantitatively correct the raw predicted shifts from database searching. This database currently includes data from 200 proteins, representing 24,166 triplets.


Reliability

of SPARTA

The reliability of the SPARTA approach was tested by a cross-validation procedure where each protein was temporarily removed from the database, and its backbone chemical shifts (N, HN, HA, CA, CB and C’) were predicted using the remaining protein data. The RMS deviations between the predicted and experimental shifts are 2.36, 0.46, 0.25, 0.88, 0.97 and 1.01 ppm, respectively. The same shifts prediction accuracies are also obtained for the proteins with known structures which are not contained in the database.

Importantly, it is also found in the test that the standard deviation the shifts from the central residues of the 20 matches are correlated with the shifts prediction errors. By checking the standard deviations in the prediction summary file (pred/pred.tab) will provide an idea of the prediction reliability.

It should be noted that the global structural information, such as ring current shifts and hydrogen bonding, was also carefully considered in SPARTA. The secondary shifts in SPARTA database are actually the corrected shifts using the ring current shifts. As “compensation”, the SPARTA predicted shifts for target protein are also corrected by adding the calculated ring current shifts from target protein. For HA and HN, the predicted secondary shifts are also corrected by using the hydrogen bond length and their relationship with the prediction errors, which were derived from above cross-validation. Therefore, the accuracy of the coordinates of the target protein is critical to obtain the reliable hydrogen bond information and ring current shifts, and the final predicted shifts. The calculated hydrogen bond and ring current shifts information is stored in the input summary file (/pred/protein_in.tab).

It should also be noted that the protein backbone chemical shifts are extremely sensitive to the local conformation; therefore, SPARTA results for the residues in the flexible region or the with very large ring current shifts contribution may be less reliable, which was also indicated by the test.


Components

of the SPARTA Package

The SPARTA system is implemented using C++. The compiled executable files ($SPARTA_DIR/src/SPARTA for Linux, $SPARTA_DIR/src/SPARTA.exe for Windows) or starting script ("$SPARTA_DIR/sparta" for Linux) can be invoked with "TALOS-like" command-line argument. A complete list of options can be invoked and generated with a "-help"

command-line argument or simply typing in the executive files or starting script without any command-line arguments.

Running SPARTA requires definition of the environment variables " SPARTA_DIR "; this will be established automatically by the starting script ("$SPARTA_DIR/sparta" in Linux):

setenv SPARTA_DIR /disk1/SPARTA

$SPARTA_DIR/src/SPARTA $argv[1-$#argv]

Note that the default "$SPARTA_DIR" is the current directory if not specified.

Other files of the SPARTA package include:

$SPARTA_DIR/tab/sparta.tab The compiled database of residue triplets with their corresponding PHI/PSI/CHI1 angles and secondary shifts.

*$SPARTA_DIR/tab/randcoil.tab, rcadj.tab, rcprev.tab, rcnext.tab* The table of random coil shifts, adjustments values from neighboring residues used in the shifts prediction process. (The same tables as used in TALOS, http://spin.niddk.nih.gov/NMRPipe/talos/)
$SPARTA_DIR/tab/homology.tab The residue type homology factors used in the prediction process, which is similar to the table used by TALOS.
$SPARTA_DIR.tab/weight.tab The weighting factors of PHI, PSI and CHI1 angles, and residue type homology used in the prediction process.
$SPARTA_DIR.tab/fitting.tab The fitting parameters between prediction accuracy and precision, which will be used after the prediction process to calculate the estimated prediction error.

$SPARTA_DIR/shifts/.tab* The files in this directory are only used when compiling a new database. When compiling a new database, only shift tables ending with the ".tab" extension will be used. The files in this directory are the chemical shift tables for the proteins in the database, which are in the same format as the TALOS shifts tables and must be exactly consistent with the corresponding structures in the SPARTA pdb directory.
$SPARTA_DIR/pdb/.pdb* The PDB coordinates files in this directory are only used along with the files in the SPARTA shifts directory when compiling a new database (e.g. adding new proteins into the database). The sequence and residue numbering must be exactly consistent with the corresponding assignments in the SPARTA shifts directory. Furthermore, the names of these files must be exactly consistent with the corresponding chemical shift tables in the SPARTA shifts directory.
$SPARTA_DIR/test/* The contents of this "test" directory include the input files and results for a sample SPARTA analysis.


How to Use

SPARTA

Use of SPARTA to predict backbone chemical shifts involves the following steps:

  1. Create a directory for the prediction session; all subsequent commands will be executed from this directory.

  2. Prepare the input PDB coordinate file (for example "protein.pdb"), according to the format given above.

  3. Run SPARTA ("$SPARTA_DIR/src/SPARTA" or "$SPARTA_DIR/sparta" in Linux, "$SPARTA_DIR/src/SPARTA.exe" in Windows) to perform the database searches. Most commonly, this will simply require a command such as:

    sparta -in protein.pdb

    SPARTA will first generate an input "pred/protein_in.tab" file from PDB coordinates, which contains of the phi, psi, chi1 angles, H-bonding information and ring current shifts. During the database search, a series of files "pred/X/res*.tab" (X = N, H, HA, CA, CB and C) will be created. Each one of these files tallies the 20 best database matches for a given residue in the target protein. Before exiting, a file "pred.tab" will also be created in "pred" directory, which includes a summary of the prediction results. The database search will typically take about 25 sec for a 100-residue protein on a Linux PC with a 2.8GHz CPU.

  4. If experimental chemical shifts for target protein are available (with a name "ref.tab", for example, and the same format as typical TALOS shift table file, http://spin.niddk.nih.gov/NMRPipe/talos/), SAELDI prediction can be performed by a command such as:

    sparta -in protein.pdb -ref ref.tab

    SPARTA would compare the predicted chemical shifts and experimental shifts before exiting, and a prediction summary file "pred/pred.tab" will be generated to store the comparison between the reference and predicted shifts, as well as the errors. If the average prediction error larger than 3 times of the expected errors (standard deviation of prediction errors / square root of number of shifts), a warning is printed and a reference correction will be applied to the experimental chemical shifts. The corrected reference chemical shifts are stored into a new file "pred/ref.tab"


Preparing

the Input PDB Coordinates

The input PDB coordinates should be prepared carefully, so that it has the proper format, naming conventions. SPARTA accept the standard PDB coordinates file, but ONLY the FIRST conformer/chain if more than one exist. For PDB coordinates without hydrogen atoms, the hydrogen atoms are required to be added (by using programs DYNAMO, REDUCE, MOLMOL, or any other similar programs) in order to get the hydrogen bonding information and ring current shifts. For HA atoms of Gly, please use atom names of "HA1/HA2"

Examples of the required PDB coordinate format can be found in the "$SPARTA_DIR/pdb" and

"$SPARTA_DIR/test" directories.


Adding

New Proteins to the Database

New protein chemical shift and structure data can be added to the database. Note well that this should be done with great care and caution, to ensure that only reliable phi/psi/chi1 data with consistently referenced and correct chemical shifts are included. It suggests that

  1. The chemical shifts assignments for each candidate protein are better validated by conducting a SPARTA shift prediction using its PDB coordinates.

    sparta -in protein.pdb -ref ref.tab
  2. Check the prediction summary table (pred/pred.tab) files, remove the experimental shifts for which the predicted shifts deviated five standard deviations. Notably, HAs, for which ring current shifts are > 1.5ppm and the predicted shifts deviate the three standard deviations, are better removed.

  3. Chemical shifts shoule be referenced correctly. A quick check can be conduct by runing above SPARTA prediction for this protein and inspecting the average SPARTA prediction errors, which are listed in the header of prediction summary table (pred/pred.tab). By default, SPARTA will apply a shift referencing correction if the average prediction error is larger than 3 times expected error (i.e., standard deviation of prediction errors / square root of number of shifts), and store the corrected shifts in a file "pred/ref.tab"

Given this, the procedure for adding new proteins to the SPARTA database is simple as:

  1. Create a chemical shift table for the new protein according to the TALOS format (http://spin.niddk.nih.gov/NMRPipe/talos/). Copy the table to the "$SPARTA_DIR/shifts" directory; it must have a ".tab" extension in order to be used.

  2. Place the corresponding PDB structure file into the "$SPARTA_DIR/pdb" directory; it must have a ".pdb" extension, and its file name, sequence, and residue numbering must correspond exactly with the shift table.

  3. Prepare a table file, for example with a name of "list.tab", which only contains the names of proteins to be added into the database. This table must follow the example below:

    VARS   PDB_NAME FORMAT %24s bpti ubiquitin profilin ...

    Note that the "PDB_NAME" in the table file must consistent with the files names (with ".tab"

    and ".pdb" extension) in the SPARTA pdb and shifts directories.

  4. In the "SPARTA" directory, execute the following command to compile a new database:

    sparta -compile -pdbDir ./pdb -pdbList list.tab
  5. A new database "$SPARTA_DIR/tab/sparta.tab" will be generated from the files in SPARTA pdb and shifts directories. Please backup the old database, which will be overwritten.


Compile the

Source Code

SPARTA was implemented with standard C++ using Standard Template Library (STL). To compile the source codes (in /src directory), your system must have a compatible C++ compiler and STL library. Given this, the compiling of SPARTA executable file is simple as:

cd $SPARTA_DIR/src
make

The compiling of the SPARTA program has been tested on Windows (XP) and Linux (Linux 9 or newer). The compiled executable files ("$SPARTA_DIR/src/SPARTA" for Linux, or "$SPARTA_DIR/src/SPARTA.exe" for Windows) are contained in the distributed SPARTA package.


About the

Name SPARTA

o

In antiquity Sparta was a Dorian Greek military state, originally centered in Laconia. As a city-state devoted to military training, Sparta possessed the most formidable army in the Greek world and regarded itself as the natural protector of Greece.


_[ Home ] [ NIH ] [ NIDDK ] [ Disclaimer ] [ Copyright ]_

_last updated: Apr 2007 / Webmaster_

Revision 816 Jul 2008 - Main.DavidCowburn

 

SPARTA: Shifts Predicted from Analogy in Residue type and Torsion Angle – NYSBC notes

As described in the paper:

Protein backbone chemical shifts predicted from searching a database for torsion angle and sequence homology
Yang Shen and Ad Bax
LIBRARY:ShenBax08.pdf

Local install – dl380://infotech/spartainstallPC

cygwin session ...
Script started on Mon Feb 4 12:21:52 2008

>>Administrator@cowburn-pc #[33m/cygdrive/d/spartainstall/SPARTA
./src/sparta in test/ubiquitin.pdb

Reading PDB Coordinates from test/ubiquitin.pdb
Reading Random Coil Shifts from .\tab\randcoil.tab

Reading RC Adjustments from .\tab\rcadj.tab
Reading Previous Residue RC Adjustments from .\tab\rcprev.tab
Reading Next Residue RC Adjustments from .\tab\rcnext.tab
Reading Weighting Factors from .\tab\weight.tab
Reading Residue Homology Table from .\tab\homology.tab
Reading Fitting Parameter Table from .\tab\fitting.tab
Reading .\tab\sparta.tab, 24166 Triplets
Can't save file pred\test/ubiquitin_in.tab

Analyzing test/ubiquitin.pdb 76 residues read
Predicting ...
N HA C CA CB H
124.353 5.462 175.920 55.080 30.759 8.947 2 Q test/ubiquitin
116.472 4.213 172.450 59.570 42.210 8.342 3 I test/ubiquitin
119.243 5.693 175.320 55.210 41.480 8.871 4 F test/ubiquitin

122.133 4.870 174.870 60.621 34.230 9.693 5 V test/ubiquitin
128.653 5.367 177.140 54.519 35.050 9.096 6 K test/ubiquitin
116.533 4.970 176.909 60.470 70.630 8.925 7 T test/ubiquitin
122.463 4.310 178.800 57.580 41.970 9.037 8 L test/ubiquitin
106.723 4.428 175.520 61.400 69.140 7.386 9 T test/ubiquitin
110.023 3.978 174.070 45.460 9999.000 7.522 10 G test/ubiquitin

122.734 4.361 175.940 56.280 33.200 6.915 11 K test/ubiquitin
121.573 5.264 174.320 62.390 69.910 8.627 12 T test/ubiquitin
128.243 4.545 175.220 59.980 40.950 9.852 13 I test/ubiquitin
122.653 5.067 173.789 61.940 69.650 8.696 14 T test/ubiquitin
125.933 4.779 174.670 52.830 47.070 8.760 15 L test/ubiquitin
123.293 5.045 175.860 54.820 29.450 8.177 16 E test/ubiquitin

118.342 4.713 174.160 58.431 36.400 9.226 17 V test/ubiquitin
120.123 5.078 176.161 52.720 30.310 8.723 18 E test/ubiquitin
139.146 4.141 175.310 65.470 31.950 9999.000 19 P test/ubiquitin
104.533 4.370 174.660 57.400 63.370 7.137 20 S test/ubiquitin
124.613 4.695 176.360 55.700 40.800 8.351 21 D test/ubiquitin
109.934 5.147 176.750 59.690 71.200 7.948 22 T test/ubiquitin

122.323 3.657 179.040 62.260 34.350 8.688 23 I test/ubiquitin
121.963 3.917 178.640 60.220 28.280 9.795 24 E test/ubiquitin
121.703 4.525 178.379 56.060 38.449 7.723 25 N test/ubiquitin
122.843 3.397 177.950 67.660 30.840 7.978 26 V test/ubiquitin
119.993 4.648 180.550 59.249 33.730 8.617 27 K test/ubiquitin
124.573 4.161 180.300 55.370 17.710 7.904 28 A test/ubiquitin

121.073 4.207 180.320 59.650 33.290 7.933 29 K test/ubiquitin
122.213 3.507 178.310 66.150 36.800 8.326 30 I test/ubiquitin
124.623 3.829 178.890 60.000 27.720 8.622 31 Q test/ubiquitin
120.493 4.354 177.250 57.190 40.580 8.231 32 D test/ubiquitin
116.263 4.337 177.870 58.050 34.170 7.521 33 K test/ubiquitin
115.003 4.625 177.840 55.170 32.661 8.995 34 E test/ubiquitin

109.782 4.035 173.960 46.080 9999.000 8.741 35 G test/ubiquitin
121.013 4.446 173.590 57.750 40.580 6.297 36 I test/ubiquitin
142.438 4.634 176.940 61.660 31.850 9999.000 37 P test/ubiquitin
139.608 4.117 178.320 66.260 32.890 9999.000 38 P test/ubiquitin
114.512 4.430 177.090 55.640 39.540 8.617 39 D test/ubiquitin
117.913 4.583 175.381 55.640 30.140 7.924 40 Q test/ubiquitin

118.853 4.244 176.300 56.470 31.650 7.307 41 Q test/ubiquitin
123.813 4.499 174.050 55.050 31.750 8.520 42 R test/ubiquitin
125.173 5.353 175.290 52.980 45.790 8.867 43 L test/ubiquitin
123.053 5.216 176.060 58.980 41.420 9.487 44 I test/ubiquitin
126.523 5.045 174.470 57.020 43.760 8.869 45 F test/ubiquitin
133.333 3.690 177.289 52.540 16.570 8.897 46 A test/ubiquitin

103.473 3.791 173.810 45.350 9999.000 8.087 47 G test/ubiquitin
122.702 4.623 174.700 54.550 34.530 8.284 48 K test/ubiquitin
123.543 4.666 175.670 55.740 29.000 8.667 49 Q test/ubiquitin
126.653 4.090 176.659 54.240 41.570 8.872 50 L test/ubiquitin
124.073 4.488 175.870 55.960 31.570 8.442 51 E test/ubiquitin
121.163 4.360 177.330 56.959 40.850 8.187 52 D test/ubiquitin

107.793 4.045 174.870 45.170 9999.000 9.567 53 G test/ubiquitin
120.183 4.695 175.350 54.390 32.650 7.288 54 R test/ubiquitin
109.533 5.508 176.560 59.690 72.260 8.799 55 T test/ubiquitin
119.053 4.060 180.810 58.710 40.370 8.176 56 L test/ubiquitin
114.463 4.370 178.310 61.080 62.530 8.585 57 S test/ubiquitin
125.323 4.296 177.400 57.180 40.100 7.654 58 D test/ubiquitin

116.642 4.670 174.700 58.250 40.070 7.124 59 Y test/ubiquitin
117.033 4.355 174.341 54.120 37.410 8.329 60 N test/ubiquitin
119.733 3.393 174.610 62.420 36.740 6.970 61 I test/ubiquitin
125.874 4.506 175.970 53.660 31.650 7.647 62 Q test/ubiquitin
121.433 4.001 175.810 57.791 32.649 8.487 63 K test/ubiquitin
115.083 3.465 175.250 57.890 25.900 9.591 64 E test/ubiquitin

115.863 4.640 172.160 60.890 64.910 7.383 65 S test/ubiquitin
118.242 5.614 173.950 62.340 70.080 8.737 66 T test/ubiquitin
128.243 5.060 175.770 53.900 44.260 9.801 67 L test/ubiquitin
119.513 5.292 173.150 55.000 30.531 9.633 68 H test/ubiquitin
125.592 5.282 175.270 53.890 44.380 8.533 69 L test/ubiquitin
128.073 4.351 173.999 60.800 34.910 9.490 70 V test/ubiquitin

124.262 5.361 177.830 53.940 42.851 8.067 71 L test/ubiquitin
124.244 4.921 174.953 54.777 32.225 9.169 72 R test/ubiquitin
128.176 4.628 176.270 54.090 42.511 8.881 73 L test/ubiquitin
124.343 4.706 175.048 54.919 31.176 8.588 74 R test/ubiquitin
112.599 4.156 173.001 44.721 9999.000 8.348 75 G test/ubiquitin

Running time: 20.343 seconds
>>/cygdrive/d/spartainstall/SPARTA#

Using a standard MS shell with the infotech drive mounted ..

src\sparta -in test\ubiquitin.pdb ...

Original text --

Contact: shenyang@niddk.nih.gov; bax@nih.gov Web: http://spin.niddk.nih.gov/bax


DOWNLOAD

[[http://spin.niddk.nih.gov/bax/software/SPARTA/sparta.linux.tar.Z][RedHat Linux /Fedora Core version ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/sparta.win32.zip][Win32 version]]

The download unix archive can be unpacked with a command like the following:

   zcat sparta.linux.tar.Z | tar xvf -

The win32 archive can be unpacked with a traditional Windows zip software.

Users are encouraged to email the author to be informed about updates and related software.


[[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#what%20is%20sparta][ *What is SPARTA?* ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#reliability][ *Reliability of SPARTA* ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#components][ *Components of the SPARTA Package* ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#how%20to%20use][ *How to Use SPARTA* ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#PDB%20coordinates][ *Preparing the PDB Coordinates* ]]

[[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#add%20new%20proteins][ *Adding New Proteins to the Database* ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#compile][ *Compile the Source Code* ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#about%20name][ *About the Name SPARTA* ]]


What

is SPARTA?

SPARTA is a database system for empirical prediction of backbone chemical shifts (N, HN, HA, CA, CB, CO) using a combination of backbone phi, psi torsion angles and sidechain chi1 angles from a given protein with known PDB coordinates. The SPARTA approach is an extension of the well-known observation that many kinds of secondary chemical shifts (i.e. differences between chemical shifts and their corresponding random coil values) are highly correlated with aspects of protein secondary structure. The goal of SPARTA is to use phi, psi, chi1 torsion angles and sequence information from proteins structure in order to make quantitative predictions for the backbone chemical shifts

SPARTA uses the phi, psi and chi1 angles of a given residue to predict secondary shifts for that residue. SPARTA also includes the information from the next and previous residues when making predictions for a given residue. So, in practice, SPARTA uses data for three consecutive residues simultaneously (i.e. 9 torsion angles and 3 residue types) to make predictions for the central residue in a triplet.

The idea behind SPARTA is that if

Changed:
<
<
one can find some triplet of residues in a protein of known structure
>
>
one can Trash.findDFdf some triplet of residues in a protein of known structure
 with similar structure and sequence to a triplet in a target protein, then the backbone secondary chemical shifts for this protein will be useful predictors for the backbone secondary chemical shifts in the target.

The similarity is measured with a score based on the weighted sum of squares differences between the torsion angles in the target protein and the database entries, so that lower scores indicated high similarity. In order to take advantage of the correlations between residue type and secondary structure, the score also includes a small, qualitative residue type term which biases the matching towards roughly similar sequences.

In practice, SPARTA searches a database for the 20 best matches to a given triplet in the target protein. The weighted averages chemical chemical shifts (obtained by subtracting their corresponding random coil chemical shifts values and the adjustments values arising from the effects of neighboring residues) of the central residues of these 20 matches are used as a prediction for the secondary shift of the central residue. The SPARTA database was constructed using the most well-defined parts of high resolution (2.4 Angstroms or better) X-ray crystal structures to define the phi, psi and chi1 angles, as well as other structural information, such as hydrogen bonding and ring current shifts, which would be used to quantitatively correct the raw predicted shifts from database searching. This database currently includes data from 200 proteins, representing 24,166 triplets.


Reliability

of SPARTA

The reliability of the SPARTA approach was tested by a cross-validation procedure where each protein was temporarily removed from the database, and its backbone chemical shifts (N, HN, HA, CA, CB and C’) were predicted using the remaining protein data. The RMS deviations between the predicted and experimental shifts are 2.36, 0.46, 0.25, 0.88, 0.97 and 1.01 ppm, respectively. The same shifts prediction accuracies are also obtained for the proteins with known structures which are not contained in the database.

Importantly, it is also found in the test that the standard deviation the shifts from the central residues of the 20 matches are correlated with the shifts prediction errors. By checking the standard deviations in the prediction summary file (pred/pred.tab) will provide an idea of the prediction reliability.

It should be noted that the global structural information, such as ring current shifts and hydrogen bonding, was also carefully considered in SPARTA. The secondary shifts in SPARTA database are actually the corrected shifts using the ring current shifts. As “compensation”, the SPARTA predicted shifts for target protein are also corrected by adding the calculated ring current shifts from target protein. For HA and HN, the predicted secondary shifts are also corrected by using the hydrogen bond length and their relationship with the prediction errors, which were derived from above cross-validation. Therefore, the accuracy of the coordinates of the target protein is critical to obtain the reliable hydrogen bond information and ring current shifts, and the final predicted shifts. The calculated hydrogen bond and ring current shifts information is stored in the input summary file (/pred/protein_in.tab).

It should also be noted that the protein backbone chemical shifts are extremely sensitive to the local conformation; therefore, SPARTA results for the residues in the flexible region or the with very large ring current shifts contribution may be less reliable, which was also indicated by the test.


Components

of the SPARTA Package

The SPARTA system is implemented using C++. The compiled executable files ($SPARTA_DIR/src/SPARTA for Linux, $SPARTA_DIR/src/SPARTA.exe for Windows) or starting script ("$SPARTA_DIR/sparta" for Linux) can be invoked with "TALOS-like" command-line argument. A complete list of options can be invoked and generated with a "-help"

command-line argument or simply typing in the executive files or starting script without any command-line arguments.

Running SPARTA requires definition of the environment variables " SPARTA_DIR "; this will be established automatically by the starting script ("$SPARTA_DIR/sparta" in Linux):

setenv SPARTA_DIR /disk1/SPARTA

$SPARTA_DIR/src/SPARTA $argv[1-$#argv]

Note that the default "$SPARTA_DIR" is the current directory if not specified.

Other files of the SPARTA package include:

$SPARTA_DIR/tab/sparta.tab The compiled database of residue triplets with their corresponding PHI/PSI/CHI1 angles and secondary shifts.

*$SPARTA_DIR/tab/randcoil.tab, rcadj.tab, rcprev.tab, rcnext.tab* The table of random coil shifts, adjustments values from neighboring residues used in the shifts prediction process. (The same tables as used in TALOS, http://spin.niddk.nih.gov/NMRPipe/talos/)
$SPARTA_DIR/tab/homology.tab The residue type homology factors used in the prediction process, which is similar to the table used by TALOS.
$SPARTA_DIR.tab/weight.tab The weighting factors of PHI, PSI and CHI1 angles, and residue type homology used in the prediction process.
$SPARTA_DIR.tab/fitting.tab The fitting parameters between prediction accuracy and precision, which will be used after the prediction process to calculate the estimated prediction error.

$SPARTA_DIR/shifts/.tab* The files in this directory are only used when compiling a new database. When compiling a new database, only shift tables ending with the ".tab" extension will be used. The files in this directory are the chemical shift tables for the proteins in the database, which are in the same format as the TALOS shifts tables and must be exactly consistent with the corresponding structures in the SPARTA pdb directory.
$SPARTA_DIR/pdb/.pdb* The PDB coordinates files in this directory are only used along with the files in the SPARTA shifts directory when compiling a new database (e.g. adding new proteins into the database). The sequence and residue numbering must be exactly consistent with the corresponding assignments in the SPARTA shifts directory. Furthermore, the names of these files must be exactly consistent with the corresponding chemical shift tables in the SPARTA shifts directory.
$SPARTA_DIR/test/* The contents of this "test" directory include the input files and results for a sample SPARTA analysis.


How to Use

SPARTA

Use of SPARTA to predict backbone chemical shifts involves the following steps:

  1. Create a directory for the prediction session; all subsequent commands will be executed from this directory.

  2. Prepare the input PDB coordinate file (for example "protein.pdb"), according to the format given above.

  3. Run SPARTA ("$SPARTA_DIR/src/SPARTA" or "$SPARTA_DIR/sparta" in Linux, "$SPARTA_DIR/src/SPARTA.exe" in Windows) to perform the database searches. Most commonly, this will simply require a command such as:

    sparta -in protein.pdb

    SPARTA will first generate an input "pred/protein_in.tab" file from PDB coordinates, which contains of the phi, psi, chi1 angles, H-bonding information and ring current shifts. During the database search, a series of files "pred/X/res*.tab" (X = N, H, HA, CA, CB and C) will be created. Each one of these files tallies the 20 best database matches for a given residue in the target protein. Before exiting, a file "pred.tab" will also be created in "pred" directory, which includes a summary of the prediction results. The database search will typically take about 25 sec for a 100-residue protein on a Linux PC with a 2.8GHz CPU.

  4. If experimental chemical shifts for target protein are available (with a name "ref.tab", for example, and the same format as typical TALOS shift table file, http://spin.niddk.nih.gov/NMRPipe/talos/), SAELDI prediction can be performed by a command such as:

    sparta -in protein.pdb -ref ref.tab

    SPARTA would compare the predicted chemical shifts and experimental shifts before exiting, and a prediction summary file "pred/pred.tab" will be generated to store the comparison between the reference and predicted shifts, as well as the errors. If the average prediction error larger than 3 times of the expected errors (standard deviation of prediction errors / square root of number of shifts), a warning is printed and a reference correction will be applied to the experimental chemical shifts. The corrected reference chemical shifts are stored into a new file "pred/ref.tab"


Preparing

the Input PDB Coordinates

The input PDB coordinates should be prepared carefully, so that it has the proper format, naming conventions. SPARTA accept the standard PDB coordinates file, but ONLY the FIRST conformer/chain if more than one exist. For PDB coordinates without hydrogen atoms, the hydrogen atoms are required to be added (by using programs DYNAMO, REDUCE, MOLMOL, or any other similar programs) in order to get the hydrogen bonding information and ring current shifts. For HA atoms of Gly, please use atom names of "HA1/HA2"

Examples of the required PDB coordinate format can be found in the "$SPARTA_DIR/pdb" and

"$SPARTA_DIR/test" directories.


Adding

New Proteins to the Database

New protein chemical shift and structure data can be added to the database. Note well that this should be done with great care and caution, to ensure that only reliable phi/psi/chi1 data with consistently referenced and correct chemical shifts are included. It suggests that

  1. The chemical shifts assignments for each candidate protein are better validated by conducting a SPARTA shift prediction using its PDB coordinates.

    sparta -in protein.pdb -ref ref.tab
  2. Check the prediction summary table (pred/pred.tab) files, remove the experimental shifts for which the predicted shifts deviated five standard deviations. Notably, HAs, for which ring current shifts are > 1.5ppm and the predicted shifts deviate the three standard deviations, are better removed.

  3. Chemical shifts shoule be referenced correctly. A quick check can be conduct by runing above SPARTA prediction for this protein and inspecting the average SPARTA prediction errors, which are listed in the header of prediction summary table (pred/pred.tab). By default, SPARTA will apply a shift referencing correction if the average prediction error is larger than 3 times expected error (i.e., standard deviation of prediction errors / square root of number of shifts), and store the corrected shifts in a file "pred/ref.tab"

Given this, the procedure for adding new proteins to the SPARTA database is simple as:

  1. Create a chemical shift table for the new protein according to the TALOS format (http://spin.niddk.nih.gov/NMRPipe/talos/). Copy the table to the "$SPARTA_DIR/shifts" directory; it must have a ".tab" extension in order to be used.

  2. Place the corresponding PDB structure file into the "$SPARTA_DIR/pdb" directory; it must have a ".pdb" extension, and its file name, sequence, and residue numbering must correspond exactly with the shift table.

  3. Prepare a table file, for example with a name of "list.tab", which only contains the names of proteins to be added into the database. This table must follow the example below:

    VARS   PDB_NAME FORMAT %24s bpti ubiquitin profilin ...

    Note that the "PDB_NAME" in the table file must consistent with the files names (with ".tab"

    and ".pdb" extension) in the SPARTA pdb and shifts directories.

  4. In the "SPARTA" directory, execute the following command to compile a new database:

    sparta -compile -pdbDir ./pdb -pdbList list.tab
  5. A new database "$SPARTA_DIR/tab/sparta.tab" will be generated from the files in SPARTA pdb and shifts directories. Please backup the old database, which will be overwritten.


Compile the

Source Code

SPARTA was implemented with standard C++ using Standard Template Library (STL). To compile the source codes (in /src directory), your system must have a compatible C++ compiler and STL library. Given this, the compiling of SPARTA executable file is simple as:

cd $SPARTA_DIR/src
make

The compiling of the SPARTA program has been tested on Windows (XP) and Linux (Linux 9 or newer). The compiled executable files ("$SPARTA_DIR/src/SPARTA" for Linux, or "$SPARTA_DIR/src/SPARTA.exe" for Windows) are contained in the distributed SPARTA package.


About the

Name SPARTA

o

In antiquity Sparta was a Dorian Greek military state, originally centered in Laconia. As a city-state devoted to military training, Sparta possessed the most formidable army in the Greek world and regarded itself as the natural protector of Greece.


_[ Home ] [ NIH ] [ NIDDK ] [ Disclaimer ] [ Copyright ]_

_last updated: Apr 2007 / Webmaster_

Revision 704 Feb 2008 - Main.DavidCowburn

 

SPARTA: Shifts Predicted from Analogy in Residue type and Torsion Angle – NYSBC notes

As described in the paper:

Protein backbone chemical shifts predicted from searching a database for torsion angle and sequence homology
Yang Shen and Ad Bax
LIBRARY:ShenBax08.pdf
Changed:
<
<
---++ Local install – dl380://infotech/spartainstallPC
>
>

Local install – dl380://infotech/spartainstallPC

 
cygwin session ...
Script started on Mon Feb 4 12:21:52 2008

>>Administrator@cowburn-pc #[33m/cygdrive/d/spartainstall/SPARTA
./src/sparta in test/ubiquitin.pdb

Reading PDB Coordinates from test/ubiquitin.pdb
Reading Random Coil Shifts from .\tab\randcoil.tab

Reading RC Adjustments from .\tab\rcadj.tab
Reading Previous Residue RC Adjustments from .\tab\rcprev.tab
Reading Next Residue RC Adjustments from .\tab\rcnext.tab
Reading Weighting Factors from .\tab\weight.tab
Reading Residue Homology Table from .\tab\homology.tab
Reading Fitting Parameter Table from .\tab\fitting.tab
Reading .\tab\sparta.tab, 24166 Triplets
Can't save file pred\test/ubiquitin_in.tab

Analyzing test/ubiquitin.pdb 76 residues read
Predicting ...
N HA C CA CB H
124.353 5.462 175.920 55.080 30.759 8.947 2 Q test/ubiquitin
116.472 4.213 172.450 59.570 42.210 8.342 3 I test/ubiquitin
119.243 5.693 175.320 55.210 41.480 8.871 4 F test/ubiquitin

122.133 4.870 174.870 60.621 34.230 9.693 5 V test/ubiquitin
128.653 5.367 177.140 54.519 35.050 9.096 6 K test/ubiquitin
116.533 4.970 176.909 60.470 70.630 8.925 7 T test/ubiquitin
122.463 4.310 178.800 57.580 41.970 9.037 8 L test/ubiquitin
106.723 4.428 175.520 61.400 69.140 7.386 9 T test/ubiquitin
110.023 3.978 174.070 45.460 9999.000 7.522 10 G test/ubiquitin

122.734 4.361 175.940 56.280 33.200 6.915 11 K test/ubiquitin
121.573 5.264 174.320 62.390 69.910 8.627 12 T test/ubiquitin
128.243 4.545 175.220 59.980 40.950 9.852 13 I test/ubiquitin
122.653 5.067 173.789 61.940 69.650 8.696 14 T test/ubiquitin
125.933 4.779 174.670 52.830 47.070 8.760 15 L test/ubiquitin
123.293 5.045 175.860 54.820 29.450 8.177 16 E test/ubiquitin

118.342 4.713 174.160 58.431 36.400 9.226 17 V test/ubiquitin
120.123 5.078 176.161 52.720 30.310 8.723 18 E test/ubiquitin
139.146 4.141 175.310 65.470 31.950 9999.000 19 P test/ubiquitin
104.533 4.370 174.660 57.400 63.370 7.137 20 S test/ubiquitin
124.613 4.695 176.360 55.700 40.800 8.351 21 D test/ubiquitin
109.934 5.147 176.750 59.690 71.200 7.948 22 T test/ubiquitin

122.323 3.657 179.040 62.260 34.350 8.688 23 I test/ubiquitin
121.963 3.917 178.640 60.220 28.280 9.795 24 E test/ubiquitin
121.703 4.525 178.379 56.060 38.449 7.723 25 N test/ubiquitin
122.843 3.397 177.950 67.660 30.840 7.978 26 V test/ubiquitin
119.993 4.648 180.550 59.249 33.730 8.617 27 K test/ubiquitin
124.573 4.161 180.300 55.370 17.710 7.904 28 A test/ubiquitin

121.073 4.207 180.320 59.650 33.290 7.933 29 K test/ubiquitin
122.213 3.507 178.310 66.150 36.800 8.326 30 I test/ubiquitin
124.623 3.829 178.890 60.000 27.720 8.622 31 Q test/ubiquitin
120.493 4.354 177.250 57.190 40.580 8.231 32 D test/ubiquitin
116.263 4.337 177.870 58.050 34.170 7.521 33 K test/ubiquitin
115.003 4.625 177.840 55.170 32.661 8.995 34 E test/ubiquitin

109.782 4.035 173.960 46.080 9999.000 8.741 35 G test/ubiquitin
121.013 4.446 173.590 57.750 40.580 6.297 36 I test/ubiquitin
142.438 4.634 176.940 61.660 31.850 9999.000 37 P test/ubiquitin
139.608 4.117 178.320 66.260 32.890 9999.000 38 P test/ubiquitin
114.512 4.430 177.090 55.640 39.540 8.617 39 D test/ubiquitin
117.913 4.583 175.381 55.640 30.140 7.924 40 Q test/ubiquitin

118.853 4.244 176.300 56.470 31.650 7.307 41 Q test/ubiquitin
123.813 4.499 174.050 55.050 31.750 8.520 42 R test/ubiquitin
125.173 5.353 175.290 52.980 45.790 8.867 43 L test/ubiquitin
123.053 5.216 176.060 58.980 41.420 9.487 44 I test/ubiquitin
126.523 5.045 174.470 57.020 43.760 8.869 45 F test/ubiquitin
133.333 3.690 177.289 52.540 16.570 8.897 46 A test/ubiquitin

103.473 3.791 173.810 45.350 9999.000 8.087 47 G test/ubiquitin
122.702 4.623 174.700 54.550 34.530 8.284 48 K test/ubiquitin
123.543 4.666 175.670 55.740 29.000 8.667 49 Q test/ubiquitin
126.653 4.090 176.659 54.240 41.570 8.872 50 L test/ubiquitin
124.073 4.488 175.870 55.960 31.570 8.442 51 E test/ubiquitin
121.163 4.360 177.330 56.959 40.850 8.187 52 D test/ubiquitin

107.793 4.045 174.870 45.170 9999.000 9.567 53 G test/ubiquitin
120.183 4.695 175.350 54.390 32.650 7.288 54 R test/ubiquitin
109.533 5.508 176.560 59.690 72.260 8.799 55 T test/ubiquitin
119.053 4.060 180.810 58.710 40.370 8.176 56 L test/ubiquitin
114.463 4.370 178.310 61.080 62.530 8.585 57 S test/ubiquitin
125.323 4.296 177.400 57.180 40.100 7.654 58 D test/ubiquitin

116.642 4.670 174.700 58.250 40.070 7.124 59 Y test/ubiquitin
117.033 4.355 174.341 54.120 37.410 8.329 60 N test/ubiquitin
119.733 3.393 174.610 62.420 36.740 6.970 61 I test/ubiquitin
125.874 4.506 175.970 53.660 31.650 7.647 62 Q test/ubiquitin
121.433 4.001 175.810 57.791 32.649 8.487 63 K test/ubiquitin
115.083 3.465 175.250 57.890 25.900 9.591 64 E test/ubiquitin

115.863 4.640 172.160 60.890 64.910 7.383 65 S test/ubiquitin
118.242 5.614 173.950 62.340 70.080 8.737 66 T test/ubiquitin
128.243 5.060 175.770 53.900 44.260 9.801 67 L test/ubiquitin
119.513 5.292 173.150 55.000 30.531 9.633 68 H test/ubiquitin
125.592 5.282 175.270 53.890 44.380 8.533 69 L test/ubiquitin
128.073 4.351 173.999 60.800 34.910 9.490 70 V test/ubiquitin

124.262 5.361 177.830 53.940 42.851 8.067 71 L test/ubiquitin
124.244 4.921 174.953 54.777 32.225 9.169 72 R test/ubiquitin
128.176 4.628 176.270 54.090 42.511 8.881 73 L test/ubiquitin
124.343 4.706 175.048 54.919 31.176 8.588 74 R test/ubiquitin
112.599 4.156 173.001 44.721 9999.000 8.348 75 G test/ubiquitin

Running time: 20.343 seconds
>>/cygdrive/d/spartainstall/SPARTA#

Using a standard MS shell with the infotech drive mounted ..

src\sparta -in test\ubiquitin.pdb ...

Original text --

Contact: shenyang@niddk.nih.gov; bax@nih.gov Web: http://spin.niddk.nih.gov/bax


DOWNLOAD

[[http://spin.niddk.nih.gov/bax/software/SPARTA/sparta.linux.tar.Z][RedHat Linux /Fedora Core version ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/sparta.win32.zip][Win32 version]]

The download unix archive can be unpacked with a command like the following:

   zcat sparta.linux.tar.Z | tar xvf -

The win32 archive can be unpacked with a traditional Windows zip software.

Users are encouraged to email the author to be informed about updates and related software.


[[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#what%20is%20sparta][ *What is SPARTA?* ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#reliability][ *Reliability of SPARTA* ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#components][ *Components of the SPARTA Package* ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#how%20to%20use][ *How to Use SPARTA* ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#PDB%20coordinates][ *Preparing the PDB Coordinates* ]]

[[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#add%20new%20proteins][ *Adding New Proteins to the Database* ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#compile][ *Compile the Source Code* ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#about%20name][ *About the Name SPARTA* ]]


What

is SPARTA?

SPARTA is a database system for empirical prediction of backbone chemical shifts (N, HN, HA, CA, CB, CO) using a combination of backbone phi, psi torsion angles and sidechain chi1 angles from a given protein with known PDB coordinates. The SPARTA approach is an extension of the well-known observation that many kinds of secondary chemical shifts (i.e. differences between chemical shifts and their corresponding random coil values) are highly correlated with aspects of protein secondary structure. The goal of SPARTA is to use phi, psi, chi1 torsion angles and sequence information from proteins structure in order to make quantitative predictions for the backbone chemical shifts

SPARTA uses the phi, psi and chi1 angles of a given residue to predict secondary shifts for that residue. SPARTA also includes the information from the next and previous residues when making predictions for a given residue. So, in practice, SPARTA uses data for three consecutive residues simultaneously (i.e. 9 torsion angles and 3 residue types) to make predictions for the central residue in a triplet.

The idea behind SPARTA is that if one can find some triplet of residues in a protein of known structure with similar structure and sequence to a triplet in a target protein, then the backbone secondary chemical shifts for this protein will be useful predictors for the backbone secondary chemical shifts in the target.

The similarity is measured with a score based on the weighted sum of squares differences between the torsion angles in the target protein and the database entries, so that lower scores indicated high similarity. In order to take advantage of the correlations between residue type and secondary structure, the score also includes a small, qualitative residue type term which biases the matching towards roughly similar sequences.

In practice, SPARTA searches a database for the 20 best matches to a given triplet in the target protein. The weighted averages chemical chemical shifts (obtained by subtracting their corresponding random coil chemical shifts values and the adjustments values arising from the effects of neighboring residues) of the central residues of these 20 matches are used as a prediction for the secondary shift of the central residue. The SPARTA database was constructed using the most well-defined parts of high resolution (2.4 Angstroms or better) X-ray crystal structures to define the phi, psi and chi1 angles, as well as other structural information, such as hydrogen bonding and ring current shifts, which would be used to quantitatively correct the raw predicted shifts from database searching. This database currently includes data from 200 proteins, representing 24,166 triplets.


Reliability

of SPARTA

The reliability of the SPARTA approach was tested by a cross-validation procedure where each protein was temporarily removed from the database, and its backbone chemical shifts (N, HN, HA, CA, CB and C’) were predicted using the remaining protein data. The RMS deviations between the predicted and experimental shifts are 2.36, 0.46, 0.25, 0.88, 0.97 and 1.01 ppm, respectively. The same shifts prediction accuracies are also obtained for the proteins with known structures which are not contained in the database.

Importantly, it is also found in the test that the standard deviation the shifts from the central residues of the 20 matches are correlated with the shifts prediction errors. By checking the standard deviations in the prediction summary file (pred/pred.tab) will provide an idea of the prediction reliability.

It should be noted that the global structural information, such as ring current shifts and hydrogen bonding, was also carefully considered in SPARTA. The secondary shifts in SPARTA database are actually the corrected shifts using the ring current shifts. As “compensation”, the SPARTA predicted shifts for target protein are also corrected by adding the calculated ring current shifts from target protein. For HA and HN, the predicted secondary shifts are also corrected by using the hydrogen bond length and their relationship with the prediction errors, which were derived from above cross-validation. Therefore, the accuracy of the coordinates of the target protein is critical to obtain the reliable hydrogen bond information and ring current shifts, and the final predicted shifts. The calculated hydrogen bond and ring current shifts information is stored in the input summary file (/pred/protein_in.tab).

It should also be noted that the protein backbone chemical shifts are extremely sensitive to the local conformation; therefore, SPARTA results for the residues in the flexible region or the with very large ring current shifts contribution may be less reliable, which was also indicated by the test.


Components

of the SPARTA Package

The SPARTA system is implemented using C++. The compiled executable files ($SPARTA_DIR/src/SPARTA for Linux, $SPARTA_DIR/src/SPARTA.exe for Windows) or starting script ("$SPARTA_DIR/sparta" for Linux) can be invoked with "TALOS-like" command-line argument. A complete list of options can be invoked and generated with a "-help"

command-line argument or simply typing in the executive files or starting script without any command-line arguments.

Running SPARTA requires definition of the environment variables " SPARTA_DIR "; this will be established automatically by the starting script ("$SPARTA_DIR/sparta" in Linux):

setenv SPARTA_DIR /disk1/SPARTA

$SPARTA_DIR/src/SPARTA $argv[1-$#argv]

Note that the default "$SPARTA_DIR" is the current directory if not specified.

Other files of the SPARTA package include:

$SPARTA_DIR/tab/sparta.tab The compiled database of residue triplets with their corresponding PHI/PSI/CHI1 angles and secondary shifts.

*$SPARTA_DIR/tab/randcoil.tab, rcadj.tab, rcprev.tab, rcnext.tab* The table of random coil shifts, adjustments values from neighboring residues used in the shifts prediction process. (The same tables as used in TALOS, http://spin.niddk.nih.gov/NMRPipe/talos/)
$SPARTA_DIR/tab/homology.tab The residue type homology factors used in the prediction process, which is similar to the table used by TALOS.
$SPARTA_DIR.tab/weight.tab The weighting factors of PHI, PSI and CHI1 angles, and residue type homology used in the prediction process.
$SPARTA_DIR.tab/fitting.tab The fitting parameters between prediction accuracy and precision, which will be used after the prediction process to calculate the estimated prediction error.

$SPARTA_DIR/shifts/.tab* The files in this directory are only used when compiling a new database. When compiling a new database, only shift tables ending with the ".tab" extension will be used. The files in this directory are the chemical shift tables for the proteins in the database, which are in the same format as the TALOS shifts tables and must be exactly consistent with the corresponding structures in the SPARTA pdb directory.
$SPARTA_DIR/pdb/.pdb* The PDB coordinates files in this directory are only used along with the files in the SPARTA shifts directory when compiling a new database (e.g. adding new proteins into the database). The sequence and residue numbering must be exactly consistent with the corresponding assignments in the SPARTA shifts directory. Furthermore, the names of these files must be exactly consistent with the corresponding chemical shift tables in the SPARTA shifts directory.
$SPARTA_DIR/test/* The contents of this "test" directory include the input files and results for a sample SPARTA analysis.


How to Use

SPARTA

Use of SPARTA to predict backbone chemical shifts involves the following steps:

  1. Create a directory for the prediction session; all subsequent commands will be executed from this directory.

  2. Prepare the input PDB coordinate file (for example "protein.pdb"), according to the format given above.

  3. Run SPARTA ("$SPARTA_DIR/src/SPARTA" or "$SPARTA_DIR/sparta" in Linux, "$SPARTA_DIR/src/SPARTA.exe" in Windows) to perform the database searches. Most commonly, this will simply require a command such as:

    sparta -in protein.pdb

    SPARTA will first generate an input "pred/protein_in.tab" file from PDB coordinates, which contains of the phi, psi, chi1 angles, H-bonding information and ring current shifts. During the database search, a series of files "pred/X/res*.tab" (X = N, H, HA, CA, CB and C) will be created. Each one of these files tallies the 20 best database matches for a given residue in the target protein. Before exiting, a file "pred.tab" will also be created in "pred" directory, which includes a summary of the prediction results. The database search will typically take about 25 sec for a 100-residue protein on a Linux PC with a 2.8GHz CPU.

  4. If experimental chemical shifts for target protein are available (with a name "ref.tab", for example, and the same format as typical TALOS shift table file, http://spin.niddk.nih.gov/NMRPipe/talos/), SAELDI prediction can be performed by a command such as:

    sparta -in protein.pdb -ref ref.tab

    SPARTA would compare the predicted chemical shifts and experimental shifts before exiting, and a prediction summary file "pred/pred.tab" will be generated to store the comparison between the reference and predicted shifts, as well as the errors. If the average prediction error larger than 3 times of the expected errors (standard deviation of prediction errors / square root of number of shifts), a warning is printed and a reference correction will be applied to the experimental chemical shifts. The corrected reference chemical shifts are stored into a new file "pred/ref.tab"


Preparing

the Input PDB Coordinates

The input PDB coordinates should be prepared carefully, so that it has the proper format, naming conventions. SPARTA accept the standard PDB coordinates file, but ONLY the FIRST conformer/chain if more than one exist. For PDB coordinates without hydrogen atoms, the hydrogen atoms are required to be added (by using programs DYNAMO, REDUCE, MOLMOL, or any other similar programs) in order to get the hydrogen bonding information and ring current shifts. For HA atoms of Gly, please use atom names of "HA1/HA2"

Examples of the required PDB coordinate format can be found in the "$SPARTA_DIR/pdb" and

"$SPARTA_DIR/test" directories.


Adding

New Proteins to the Database

New protein chemical shift and structure data can be added to the database. Note well that this should be done with great care and caution, to ensure that only reliable phi/psi/chi1 data with consistently referenced and correct chemical shifts are included. It suggests that

  1. The chemical shifts assignments for each candidate protein are better validated by conducting a SPARTA shift prediction using its PDB coordinates.

    sparta -in protein.pdb -ref ref.tab
  2. Check the prediction summary table (pred/pred.tab) files, remove the experimental shifts for which the predicted shifts deviated five standard deviations. Notably, HAs, for which ring current shifts are > 1.5ppm and the predicted shifts deviate the three standard deviations, are better removed.

  3. Chemical shifts shoule be referenced correctly. A quick check can be conduct by runing above SPARTA prediction for this protein and inspecting the average SPARTA prediction errors, which are listed in the header of prediction summary table (pred/pred.tab). By default, SPARTA will apply a shift referencing correction if the average prediction error is larger than 3 times expected error (i.e., standard deviation of prediction errors / square root of number of shifts), and store the corrected shifts in a file "pred/ref.tab"

Given this, the procedure for adding new proteins to the SPARTA database is simple as:

  1. Create a chemical shift table for the new protein according to the TALOS format (http://spin.niddk.nih.gov/NMRPipe/talos/). Copy the table to the "$SPARTA_DIR/shifts" directory; it must have a ".tab" extension in order to be used.

  2. Place the corresponding PDB structure file into the "$SPARTA_DIR/pdb" directory; it must have a ".pdb" extension, and its file name, sequence, and residue numbering must correspond exactly with the shift table.

  3. Prepare a table file, for example with a name of "list.tab", which only contains the names of proteins to be added into the database. This table must follow the example below:

    VARS   PDB_NAME FORMAT %24s bpti ubiquitin profilin ...

    Note that the "PDB_NAME" in the table file must consistent with the files names (with ".tab"

    and ".pdb" extension) in the SPARTA pdb and shifts directories.

  4. In the "SPARTA" directory, execute the following command to compile a new database:

    sparta -compile -pdbDir ./pdb -pdbList list.tab
  5. A new database "$SPARTA_DIR/tab/sparta.tab" will be generated from the files in SPARTA pdb and shifts directories. Please backup the old database, which will be overwritten.


Compile the

Source Code

SPARTA was implemented with standard C++ using Standard Template Library (STL). To compile the source codes (in /src directory), your system must have a compatible C++ compiler and STL library. Given this, the compiling of SPARTA executable file is simple as:

cd $SPARTA_DIR/src
make

The compiling of the SPARTA program has been tested on Windows (XP) and Linux (Linux 9 or newer). The compiled executable files ("$SPARTA_DIR/src/SPARTA" for Linux, or "$SPARTA_DIR/src/SPARTA.exe" for Windows) are contained in the distributed SPARTA package.


About the

Name SPARTA

o

In antiquity Sparta was a Dorian Greek military state, originally centered in Laconia. As a city-state devoted to military training, Sparta possessed the most formidable army in the Greek world and regarded itself as the natural protector of Greece.


_[ Home ] [ NIH ] [ NIDDK ] [ Disclaimer ] [ Copyright ]_

_last updated: Apr 2007 / Webmaster_

Revision 604 Feb 2008 - Main.DavidCowburn

 

SPARTA: Shifts Predicted from Analogy in Residue type and Torsion Angle – NYSBC notes

As described in the paper:

Protein backbone chemical shifts predicted from searching a database for torsion angle and sequence homology
Yang Shen and Ad Bax
LIBRARY:ShenBax08.pdf
Deleted:
<
<
 
Changed:
<
<
>
>
---++ Local install – dl380://infotech/spartainstallPC
Deleted:
<
<
---++Local install – dl380://infotech/spartainstallPC
 
cygwin session ...
Script started on Mon Feb 4 12:21:52 2008

>>Administrator@cowburn-pc #[33m/cygdrive/d/spartainstall/SPARTA
./src/sparta in test/ubiquitin.pdb

Reading PDB Coordinates from test/ubiquitin.pdb
Reading Random Coil Shifts from .\tab\randcoil.tab

Reading RC Adjustments from .\tab\rcadj.tab
Reading Previous Residue RC Adjustments from .\tab\rcprev.tab
Reading Next Residue RC Adjustments from .\tab\rcnext.tab
Reading Weighting Factors from .\tab\weight.tab
Reading Residue Homology Table from .\tab\homology.tab
Reading Fitting Parameter Table from .\tab\fitting.tab
Reading .\tab\sparta.tab, 24166 Triplets
Can't save file pred\test/ubiquitin_in.tab

Analyzing test/ubiquitin.pdb 76 residues read
Predicting ...
N HA C CA CB H
124.353 5.462 175.920 55.080 30.759 8.947 2 Q test/ubiquitin
116.472 4.213 172.450 59.570 42.210 8.342 3 I test/ubiquitin
119.243 5.693 175.320 55.210 41.480 8.871 4 F test/ubiquitin

122.133 4.870 174.870 60.621 34.230 9.693 5 V test/ubiquitin
128.653 5.367 177.140 54.519 35.050 9.096 6 K test/ubiquitin
116.533 4.970 176.909 60.470 70.630 8.925 7 T test/ubiquitin
122.463 4.310 178.800 57.580 41.970 9.037 8 L test/ubiquitin
106.723 4.428 175.520 61.400 69.140 7.386 9 T test/ubiquitin
110.023 3.978 174.070 45.460 9999.000 7.522 10 G test/ubiquitin

122.734 4.361 175.940 56.280 33.200 6.915 11 K test/ubiquitin
121.573 5.264 174.320 62.390 69.910 8.627 12 T test/ubiquitin
128.243 4.545 175.220 59.980 40.950 9.852 13 I test/ubiquitin
122.653 5.067 173.789 61.940 69.650 8.696 14 T test/ubiquitin
125.933 4.779 174.670 52.830 47.070 8.760 15 L test/ubiquitin
123.293 5.045 175.860 54.820 29.450 8.177 16 E test/ubiquitin

118.342 4.713 174.160 58.431 36.400 9.226 17 V test/ubiquitin
120.123 5.078 176.161 52.720 30.310 8.723 18 E test/ubiquitin
139.146 4.141 175.310 65.470 31.950 9999.000 19 P test/ubiquitin
104.533 4.370 174.660 57.400 63.370 7.137 20 S test/ubiquitin
124.613 4.695 176.360 55.700 40.800 8.351 21 D test/ubiquitin
109.934 5.147 176.750 59.690 71.200 7.948 22 T test/ubiquitin

122.323 3.657 179.040 62.260 34.350 8.688 23 I test/ubiquitin
121.963 3.917 178.640 60.220 28.280 9.795 24 E test/ubiquitin
121.703 4.525 178.379 56.060 38.449 7.723 25 N test/ubiquitin
122.843 3.397 177.950 67.660 30.840 7.978 26 V test/ubiquitin
119.993 4.648 180.550 59.249 33.730 8.617 27 K test/ubiquitin
124.573 4.161 180.300 55.370 17.710 7.904 28 A test/ubiquitin

121.073 4.207 180.320 59.650 33.290 7.933 29 K test/ubiquitin
122.213 3.507 178.310 66.150 36.800 8.326 30 I test/ubiquitin
124.623 3.829 178.890 60.000 27.720 8.622 31 Q test/ubiquitin
120.493 4.354 177.250 57.190 40.580 8.231 32 D test/ubiquitin
116.263 4.337 177.870 58.050 34.170 7.521 33 K test/ubiquitin
115.003 4.625 177.840 55.170 32.661 8.995 34 E test/ubiquitin

109.782 4.035 173.960 46.080 9999.000 8.741 35 G test/ubiquitin
121.013 4.446 173.590 57.750 40.580 6.297 36 I test/ubiquitin
142.438 4.634 176.940 61.660 31.850 9999.000 37 P test/ubiquitin
139.608 4.117 178.320 66.260 32.890 9999.000 38 P test/ubiquitin
114.512 4.430 177.090 55.640 39.540 8.617 39 D test/ubiquitin
117.913 4.583 175.381 55.640 30.140 7.924 40 Q test/ubiquitin

118.853 4.244 176.300 56.470 31.650 7.307 41 Q test/ubiquitin
123.813 4.499 174.050 55.050 31.750 8.520 42 R test/ubiquitin
125.173 5.353 175.290 52.980 45.790 8.867 43 L test/ubiquitin
123.053 5.216 176.060 58.980 41.420 9.487 44 I test/ubiquitin
126.523 5.045 174.470 57.020 43.760 8.869 45 F test/ubiquitin
133.333 3.690 177.289 52.540 16.570 8.897 46 A test/ubiquitin

103.473 3.791 173.810 45.350 9999.000 8.087 47 G test/ubiquitin
122.702 4.623 174.700 54.550 34.530 8.284 48 K test/ubiquitin
123.543 4.666 175.670 55.740 29.000 8.667 49 Q test/ubiquitin
126.653 4.090 176.659 54.240 41.570 8.872 50 L test/ubiquitin
124.073 4.488 175.870 55.960 31.570 8.442 51 E test/ubiquitin
121.163 4.360 177.330 56.959 40.850 8.187 52 D test/ubiquitin

107.793 4.045 174.870 45.170 9999.000 9.567 53 G test/ubiquitin
120.183 4.695 175.350 54.390 32.650 7.288 54 R test/ubiquitin
109.533 5.508 176.560 59.690 72.260 8.799 55 T test/ubiquitin
119.053 4.060 180.810 58.710 40.370 8.176 56 L test/ubiquitin
114.463 4.370 178.310 61.080 62.530 8.585 57 S test/ubiquitin
125.323 4.296 177.400 57.180 40.100 7.654 58 D test/ubiquitin

116.642 4.670 174.700 58.250 40.070 7.124 59 Y test/ubiquitin
117.033 4.355 174.341 54.120 37.410 8.329 60 N test/ubiquitin
119.733 3.393 174.610 62.420 36.740 6.970 61 I test/ubiquitin
125.874 4.506 175.970 53.660 31.650 7.647 62 Q test/ubiquitin
121.433 4.001 175.810 57.791 32.649 8.487 63 K test/ubiquitin
115.083 3.465 175.250 57.890 25.900 9.591 64 E test/ubiquitin

115.863 4.640 172.160 60.890 64.910 7.383 65 S test/ubiquitin
118.242 5.614 173.950 62.340 70.080 8.737 66 T test/ubiquitin
128.243 5.060 175.770 53.900 44.260 9.801 67 L test/ubiquitin
119.513 5.292 173.150 55.000 30.531 9.633 68 H test/ubiquitin
125.592 5.282 175.270 53.890 44.380 8.533 69 L test/ubiquitin
128.073 4.351 173.999 60.800 34.910 9.490 70 V test/ubiquitin

124.262 5.361 177.830 53.940 42.851 8.067 71 L test/ubiquitin
124.244 4.921 174.953 54.777 32.225 9.169 72 R test/ubiquitin
128.176 4.628 176.270 54.090 42.511 8.881 73 L test/ubiquitin
124.343 4.706 175.048 54.919 31.176 8.588 74 R test/ubiquitin
112.599 4.156 173.001 44.721 9999.000 8.348 75 G test/ubiquitin

Running time: 20.343 seconds
>>/cygdrive/d/spartainstall/SPARTA#

Using a standard MS shell with the infotech drive mounted ..

src\sparta -in test\ubiquitin.pdb ...

Original text --

Contact: shenyang@niddk.nih.gov; bax@nih.gov Web: http://spin.niddk.nih.gov/bax


DOWNLOAD

[[http://spin.niddk.nih.gov/bax/software/SPARTA/sparta.linux.tar.Z][RedHat Linux /Fedora Core version ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/sparta.win32.zip][Win32 version]]

The download unix archive can be unpacked with a command like the following:

   zcat sparta.linux.tar.Z | tar xvf -

The win32 archive can be unpacked with a traditional Windows zip software.

Users are encouraged to email the author to be informed about updates and related software.


[[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#what%20is%20sparta][ *What is SPARTA?* ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#reliability][ *Reliability of SPARTA* ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#components][ *Components of the SPARTA Package* ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#how%20to%20use][ *How to Use SPARTA* ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#PDB%20coordinates][ *Preparing the PDB Coordinates* ]]

[[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#add%20new%20proteins][ *Adding New Proteins to the Database* ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#compile][ *Compile the Source Code* ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#about%20name][ *About the Name SPARTA* ]]


What

is SPARTA?

SPARTA is a database system for empirical prediction of backbone chemical shifts (N, HN, HA, CA, CB, CO) using a combination of backbone phi, psi torsion angles and sidechain chi1 angles from a given protein with known PDB coordinates. The SPARTA approach is an extension of the well-known observation that many kinds of secondary chemical shifts (i.e. differences between chemical shifts and their corresponding random coil values) are highly correlated with aspects of protein secondary structure. The goal of SPARTA is to use phi, psi, chi1 torsion angles and sequence information from proteins structure in order to make quantitative predictions for the backbone chemical shifts

SPARTA uses the phi, psi and chi1 angles of a given residue to predict secondary shifts for that residue. SPARTA also includes the information from the next and previous residues when making predictions for a given residue. So, in practice, SPARTA uses data for three consecutive residues simultaneously (i.e. 9 torsion angles and 3 residue types) to make predictions for the central residue in a triplet.

The idea behind SPARTA is that if one can find some triplet of residues in a protein of known structure with similar structure and sequence to a triplet in a target protein, then the backbone secondary chemical shifts for this protein will be useful predictors for the backbone secondary chemical shifts in the target.

The similarity is measured with a score based on the weighted sum of squares differences between the torsion angles in the target protein and the database entries, so that lower scores indicated high similarity. In order to take advantage of the correlations between residue type and secondary structure, the score also includes a small, qualitative residue type term which biases the matching towards roughly similar sequences.

In practice, SPARTA searches a database for the 20 best matches to a given triplet in the target protein. The weighted averages chemical chemical shifts (obtained by subtracting their corresponding random coil chemical shifts values and the adjustments values arising from the effects of neighboring residues) of the central residues of these 20 matches are used as a prediction for the secondary shift of the central residue. The SPARTA database was constructed using the most well-defined parts of high resolution (2.4 Angstroms or better) X-ray crystal structures to define the phi, psi and chi1 angles, as well as other structural information, such as hydrogen bonding and ring current shifts, which would be used to quantitatively correct the raw predicted shifts from database searching. This database currently includes data from 200 proteins, representing 24,166 triplets.


Reliability

of SPARTA

The reliability of the SPARTA approach was tested by a cross-validation procedure where each protein was temporarily removed from the database, and its backbone chemical shifts (N, HN, HA, CA, CB and C’) were predicted using the remaining protein data. The RMS deviations between the predicted and experimental shifts are 2.36, 0.46, 0.25, 0.88, 0.97 and 1.01 ppm, respectively. The same shifts prediction accuracies are also obtained for the proteins with known structures which are not contained in the database.

Importantly, it is also found in the test that the standard deviation the shifts from the central residues of the 20 matches are correlated with the shifts prediction errors. By checking the standard deviations in the prediction summary file (pred/pred.tab) will provide an idea of the prediction reliability.

It should be noted that the global structural information, such as ring current shifts and hydrogen bonding, was also carefully considered in SPARTA. The secondary shifts in SPARTA database are actually the corrected shifts using the ring current shifts. As “compensation”, the SPARTA predicted shifts for target protein are also corrected by adding the calculated ring current shifts from target protein. For HA and HN, the predicted secondary shifts are also corrected by using the hydrogen bond length and their relationship with the prediction errors, which were derived from above cross-validation. Therefore, the accuracy of the coordinates of the target protein is critical to obtain the reliable hydrogen bond information and ring current shifts, and the final predicted shifts. The calculated hydrogen bond and ring current shifts information is stored in the input summary file (/pred/protein_in.tab).

It should also be noted that the protein backbone chemical shifts are extremely sensitive to the local conformation; therefore, SPARTA results for the residues in the flexible region or the with very large ring current shifts contribution may be less reliable, which was also indicated by the test.


Components

of the SPARTA Package

The SPARTA system is implemented using C++. The compiled executable files ($SPARTA_DIR/src/SPARTA for Linux, $SPARTA_DIR/src/SPARTA.exe for Windows) or starting script ("$SPARTA_DIR/sparta" for Linux) can be invoked with "TALOS-like" command-line argument. A complete list of options can be invoked and generated with a "-help"

command-line argument or simply typing in the executive files or starting script without any command-line arguments.

Running SPARTA requires definition of the environment variables " SPARTA_DIR "; this will be established automatically by the starting script ("$SPARTA_DIR/sparta" in Linux):

setenv SPARTA_DIR /disk1/SPARTA

$SPARTA_DIR/src/SPARTA $argv[1-$#argv]

Note that the default "$SPARTA_DIR" is the current directory if not specified.

Other files of the SPARTA package include:

$SPARTA_DIR/tab/sparta.tab The compiled database of residue triplets with their corresponding PHI/PSI/CHI1 angles and secondary shifts.

*$SPARTA_DIR/tab/randcoil.tab, rcadj.tab, rcprev.tab, rcnext.tab* The table of random coil shifts, adjustments values from neighboring residues used in the shifts prediction process. (The same tables as used in TALOS, http://spin.niddk.nih.gov/NMRPipe/talos/)
$SPARTA_DIR/tab/homology.tab The residue type homology factors used in the prediction process, which is similar to the table used by TALOS.
$SPARTA_DIR.tab/weight.tab The weighting factors of PHI, PSI and CHI1 angles, and residue type homology used in the prediction process.
$SPARTA_DIR.tab/fitting.tab The fitting parameters between prediction accuracy and precision, which will be used after the prediction process to calculate the estimated prediction error.

$SPARTA_DIR/shifts/.tab* The files in this directory are only used when compiling a new database. When compiling a new database, only shift tables ending with the ".tab" extension will be used. The files in this directory are the chemical shift tables for the proteins in the database, which are in the same format as the TALOS shifts tables and must be exactly consistent with the corresponding structures in the SPARTA pdb directory.
$SPARTA_DIR/pdb/.pdb* The PDB coordinates files in this directory are only used along with the files in the SPARTA shifts directory when compiling a new database (e.g. adding new proteins into the database). The sequence and residue numbering must be exactly consistent with the corresponding assignments in the SPARTA shifts directory. Furthermore, the names of these files must be exactly consistent with the corresponding chemical shift tables in the SPARTA shifts directory.
$SPARTA_DIR/test/* The contents of this "test" directory include the input files and results for a sample SPARTA analysis.


How to Use

SPARTA

Use of SPARTA to predict backbone chemical shifts involves the following steps:

  1. Create a directory for the prediction session; all subsequent commands will be executed from this directory.

  2. Prepare the input PDB coordinate file (for example "protein.pdb"), according to the format given above.

  3. Run SPARTA ("$SPARTA_DIR/src/SPARTA" or "$SPARTA_DIR/sparta" in Linux, "$SPARTA_DIR/src/SPARTA.exe" in Windows) to perform the database searches. Most commonly, this will simply require a command such as:

    sparta -in protein.pdb

    SPARTA will first generate an input "pred/protein_in.tab" file from PDB coordinates, which contains of the phi, psi, chi1 angles, H-bonding information and ring current shifts. During the database search, a series of files "pred/X/res*.tab" (X = N, H, HA, CA, CB and C) will be created. Each one of these files tallies the 20 best database matches for a given residue in the target protein. Before exiting, a file "pred.tab" will also be created in "pred" directory, which includes a summary of the prediction results. The database search will typically take about 25 sec for a 100-residue protein on a Linux PC with a 2.8GHz CPU.

  4. If experimental chemical shifts for target protein are available (with a name "ref.tab", for example, and the same format as typical TALOS shift table file, http://spin.niddk.nih.gov/NMRPipe/talos/), SAELDI prediction can be performed by a command such as:

    sparta -in protein.pdb -ref ref.tab

    SPARTA would compare the predicted chemical shifts and experimental shifts before exiting, and a prediction summary file "pred/pred.tab" will be generated to store the comparison between the reference and predicted shifts, as well as the errors. If the average prediction error larger than 3 times of the expected errors (standard deviation of prediction errors / square root of number of shifts), a warning is printed and a reference correction will be applied to the experimental chemical shifts. The corrected reference chemical shifts are stored into a new file "pred/ref.tab"


Preparing

the Input PDB Coordinates

The input PDB coordinates should be prepared carefully, so that it has the proper format, naming conventions. SPARTA accept the standard PDB coordinates file, but ONLY the FIRST conformer/chain if more than one exist. For PDB coordinates without hydrogen atoms, the hydrogen atoms are required to be added (by using programs DYNAMO, REDUCE, MOLMOL, or any other similar programs) in order to get the hydrogen bonding information and ring current shifts. For HA atoms of Gly, please use atom names of "HA1/HA2"

Examples of the required PDB coordinate format can be found in the "$SPARTA_DIR/pdb" and

"$SPARTA_DIR/test" directories.


Adding

New Proteins to the Database

New protein chemical shift and structure data can be added to the database. Note well that this should be done with great care and caution, to ensure that only reliable phi/psi/chi1 data with consistently referenced and correct chemical shifts are included. It suggests that

  1. The chemical shifts assignments for each candidate protein are better validated by conducting a SPARTA shift prediction using its PDB coordinates.

    sparta -in protein.pdb -ref ref.tab
  2. Check the prediction summary table (pred/pred.tab) files, remove the experimental shifts for which the predicted shifts deviated five standard deviations. Notably, HAs, for which ring current shifts are > 1.5ppm and the predicted shifts deviate the three standard deviations, are better removed.

  3. Chemical shifts shoule be referenced correctly. A quick check can be conduct by runing above SPARTA prediction for this protein and inspecting the average SPARTA prediction errors, which are listed in the header of prediction summary table (pred/pred.tab). By default, SPARTA will apply a shift referencing correction if the average prediction error is larger than 3 times expected error (i.e., standard deviation of prediction errors / square root of number of shifts), and store the corrected shifts in a file "pred/ref.tab"

Given this, the procedure for adding new proteins to the SPARTA database is simple as:

  1. Create a chemical shift table for the new protein according to the TALOS format (http://spin.niddk.nih.gov/NMRPipe/talos/). Copy the table to the "$SPARTA_DIR/shifts" directory; it must have a ".tab" extension in order to be used.

  2. Place the corresponding PDB structure file into the "$SPARTA_DIR/pdb" directory; it must have a ".pdb" extension, and its file name, sequence, and residue numbering must correspond exactly with the shift table.

  3. Prepare a table file, for example with a name of "list.tab", which only contains the names of proteins to be added into the database. This table must follow the example below:

    VARS   PDB_NAME FORMAT %24s bpti ubiquitin profilin ...

    Note that the "PDB_NAME" in the table file must consistent with the files names (with ".tab"

    and ".pdb" extension) in the SPARTA pdb and shifts directories.

  4. In the "SPARTA" directory, execute the following command to compile a new database:

    sparta -compile -pdbDir ./pdb -pdbList list.tab
  5. A new database "$SPARTA_DIR/tab/sparta.tab" will be generated from the files in SPARTA pdb and shifts directories. Please backup the old database, which will be overwritten.


Compile the

Source Code

SPARTA was implemented with standard C++ using Standard Template Library (STL). To compile the source codes (in /src directory), your system must have a compatible C++ compiler and STL library. Given this, the compiling of SPARTA executable file is simple as:

cd $SPARTA_DIR/src
make

The compiling of the SPARTA program has been tested on Windows (XP) and Linux (Linux 9 or newer). The compiled executable files ("$SPARTA_DIR/src/SPARTA" for Linux, or "$SPARTA_DIR/src/SPARTA.exe" for Windows) are contained in the distributed SPARTA package.


About the

Name SPARTA

o

In antiquity Sparta was a Dorian Greek military state, originally centered in Laconia. As a city-state devoted to military training, Sparta possessed the most formidable army in the Greek world and regarded itself as the natural protector of Greece.


_[ Home ] [ NIH ] [ NIDDK ] [ Disclaimer ] [ Copyright ]_

_last updated: Apr 2007 / Webmaster_

Revision 504 Feb 2008 - Main.DavidCowburn

 

SPARTA: Shifts Predicted from Analogy in Residue type and Torsion Angle – NYSBC notes

As described in the paper:

Protein backbone chemical shifts predicted from searching a database for torsion angle and sequence homology
Yang Shen and Ad Bax
LIBRARY:ShenBax08.pdf

Changed:
<
<
Local install –
>
>
---++Local install – dl380://infotech/spartainstallPC
Deleted:
<
<
dl380://infotech/spartainstallPC
 
cygwin session ...
Script started on Mon Feb 4 12:21:52 2008

>>Administrator@cowburn-pc #[33m/cygdrive/d/spartainstall/SPARTA
./src/sparta in test/ubiquitin.pdb

Reading PDB Coordinates from test/ubiquitin.pdb
Reading Random Coil Shifts from .\tab\randcoil.tab

Reading RC Adjustments from .\tab\rcadj.tab
Reading Previous Residue RC Adjustments from .\tab\rcprev.tab
Reading Next Residue RC Adjustments from .\tab\rcnext.tab
Reading Weighting Factors from .\tab\weight.tab
Reading Residue Homology Table from .\tab\homology.tab
Reading Fitting Parameter Table from .\tab\fitting.tab
Reading .\tab\sparta.tab, 24166 Triplets
Can't save file pred\test/ubiquitin_in.tab

Analyzing test/ubiquitin.pdb 76 residues read
Predicting ...
N HA C CA CB H
124.353 5.462 175.920 55.080 30.759 8.947 2 Q test/ubiquitin
116.472 4.213 172.450 59.570 42.210 8.342 3 I test/ubiquitin
119.243 5.693 175.320 55.210 41.480 8.871 4 F test/ubiquitin

122.133 4.870 174.870 60.621 34.230 9.693 5 V test/ubiquitin
128.653 5.367 177.140 54.519 35.050 9.096 6 K test/ubiquitin
116.533 4.970 176.909 60.470 70.630 8.925 7 T test/ubiquitin
122.463 4.310 178.800 57.580 41.970 9.037 8 L test/ubiquitin
106.723 4.428 175.520 61.400 69.140 7.386 9 T test/ubiquitin
110.023 3.978 174.070 45.460 9999.000 7.522 10 G test/ubiquitin

122.734 4.361 175.940 56.280 33.200 6.915 11 K test/ubiquitin
121.573 5.264 174.320 62.390 69.910 8.627 12 T test/ubiquitin
128.243 4.545 175.220 59.980 40.950 9.852 13 I test/ubiquitin
122.653 5.067 173.789 61.940 69.650 8.696 14 T test/ubiquitin
125.933 4.779 174.670 52.830 47.070 8.760 15 L test/ubiquitin
123.293 5.045 175.860 54.820 29.450 8.177 16 E test/ubiquitin

118.342 4.713 174.160 58.431 36.400 9.226 17 V test/ubiquitin
120.123 5.078 176.161 52.720 30.310 8.723 18 E test/ubiquitin
139.146 4.141 175.310 65.470 31.950 9999.000 19 P test/ubiquitin
104.533 4.370 174.660 57.400 63.370 7.137 20 S test/ubiquitin
124.613 4.695 176.360 55.700 40.800 8.351 21 D test/ubiquitin
109.934 5.147 176.750 59.690 71.200 7.948 22 T test/ubiquitin

122.323 3.657 179.040 62.260 34.350 8.688 23 I test/ubiquitin
121.963 3.917 178.640 60.220 28.280 9.795 24 E test/ubiquitin
121.703 4.525 178.379 56.060 38.449 7.723 25 N test/ubiquitin
122.843 3.397 177.950 67.660 30.840 7.978 26 V test/ubiquitin
119.993 4.648 180.550 59.249 33.730 8.617 27 K test/ubiquitin
124.573 4.161 180.300 55.370 17.710 7.904 28 A test/ubiquitin

121.073 4.207 180.320 59.650 33.290 7.933 29 K test/ubiquitin
122.213 3.507 178.310 66.150 36.800 8.326 30 I test/ubiquitin
124.623 3.829 178.890 60.000 27.720 8.622 31 Q test/ubiquitin
120.493 4.354 177.250 57.190 40.580 8.231 32 D test/ubiquitin
116.263 4.337 177.870 58.050 34.170 7.521 33 K test/ubiquitin
115.003 4.625 177.840 55.170 32.661 8.995 34 E test/ubiquitin

109.782 4.035 173.960 46.080 9999.000 8.741 35 G test/ubiquitin
121.013 4.446 173.590 57.750 40.580 6.297 36 I test/ubiquitin
142.438 4.634 176.940 61.660 31.850 9999.000 37 P test/ubiquitin
139.608 4.117 178.320 66.260 32.890 9999.000 38 P test/ubiquitin
114.512 4.430 177.090 55.640 39.540 8.617 39 D test/ubiquitin
117.913 4.583 175.381 55.640 30.140 7.924 40 Q test/ubiquitin

118.853 4.244 176.300 56.470 31.650 7.307 41 Q test/ubiquitin
123.813 4.499 174.050 55.050 31.750 8.520 42 R test/ubiquitin
125.173 5.353 175.290 52.980 45.790 8.867 43 L test/ubiquitin
123.053 5.216 176.060 58.980 41.420 9.487 44 I test/ubiquitin
126.523 5.045 174.470 57.020 43.760 8.869 45 F test/ubiquitin
133.333 3.690 177.289 52.540 16.570 8.897 46 A test/ubiquitin

103.473 3.791 173.810 45.350 9999.000 8.087 47 G test/ubiquitin
122.702 4.623 174.700 54.550 34.530 8.284 48 K test/ubiquitin
123.543 4.666 175.670 55.740 29.000 8.667 49 Q test/ubiquitin
126.653 4.090 176.659 54.240 41.570 8.872 50 L test/ubiquitin
124.073 4.488 175.870 55.960 31.570 8.442 51 E test/ubiquitin
121.163 4.360 177.330 56.959 40.850 8.187 52 D test/ubiquitin

107.793 4.045 174.870 45.170 9999.000 9.567 53 G test/ubiquitin
120.183 4.695 175.350 54.390 32.650 7.288 54 R test/ubiquitin
109.533 5.508 176.560 59.690 72.260 8.799 55 T test/ubiquitin
119.053 4.060 180.810 58.710 40.370 8.176 56 L test/ubiquitin
114.463 4.370 178.310 61.080 62.530 8.585 57 S test/ubiquitin
125.323 4.296 177.400 57.180 40.100 7.654 58 D test/ubiquitin

116.642 4.670 174.700 58.250 40.070 7.124 59 Y test/ubiquitin
117.033 4.355 174.341 54.120 37.410 8.329 60 N test/ubiquitin
119.733 3.393 174.610 62.420 36.740 6.970 61 I test/ubiquitin
125.874 4.506 175.970 53.660 31.650 7.647 62 Q test/ubiquitin
121.433 4.001 175.810 57.791 32.649 8.487 63 K test/ubiquitin
115.083 3.465 175.250 57.890 25.900 9.591 64 E test/ubiquitin

115.863 4.640 172.160 60.890 64.910 7.383 65 S test/ubiquitin
118.242 5.614 173.950 62.340 70.080 8.737 66 T test/ubiquitin
128.243 5.060 175.770 53.900 44.260 9.801 67 L test/ubiquitin
119.513 5.292 173.150 55.000 30.531 9.633 68 H test/ubiquitin
125.592 5.282 175.270 53.890 44.380 8.533 69 L test/ubiquitin
128.073 4.351 173.999 60.800 34.910 9.490 70 V test/ubiquitin

124.262 5.361 177.830 53.940 42.851 8.067 71 L test/ubiquitin
124.244 4.921 174.953 54.777 32.225 9.169 72 R test/ubiquitin
128.176 4.628 176.270 54.090 42.511 8.881 73 L test/ubiquitin
124.343 4.706 175.048 54.919 31.176 8.588 74 R test/ubiquitin
112.599 4.156 173.001 44.721 9999.000 8.348 75 G test/ubiquitin

Running time: 20.343 seconds
>>/cygdrive/d/spartainstall/SPARTA#

Using a standard MS shell with the infotech drive mounted ..

src\sparta -in test\ubiquitin.pdb ...

Original text --

Contact: shenyang@niddk.nih.gov; bax@nih.gov Web: http://spin.niddk.nih.gov/bax


DOWNLOAD

[[http://spin.niddk.nih.gov/bax/software/SPARTA/sparta.linux.tar.Z][RedHat Linux /Fedora Core version ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/sparta.win32.zip][Win32 version]]

The download unix archive can be unpacked with a command like the following:

   zcat sparta.linux.tar.Z | tar xvf -

The win32 archive can be unpacked with a traditional Windows zip software.

Users are encouraged to email the author to be informed about updates and related software.


[[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#what%20is%20sparta][ *What is SPARTA?* ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#reliability][ *Reliability of SPARTA* ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#components][ *Components of the SPARTA Package* ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#how%20to%20use][ *How to Use SPARTA* ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#PDB%20coordinates][ *Preparing the PDB Coordinates* ]]

[[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#add%20new%20proteins][ *Adding New Proteins to the Database* ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#compile][ *Compile the Source Code* ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#about%20name][ *About the Name SPARTA* ]]


What

is SPARTA?

SPARTA is a database system for empirical prediction of backbone chemical shifts (N, HN, HA, CA, CB, CO) using a combination of backbone phi, psi torsion angles and sidechain chi1 angles from a given protein with known PDB coordinates. The SPARTA approach is an extension of the well-known observation that many kinds of secondary chemical shifts (i.e. differences between chemical shifts and their corresponding random coil values) are highly correlated with aspects of protein secondary structure. The goal of SPARTA is to use phi, psi, chi1 torsion angles and sequence information from proteins structure in order to make quantitative predictions for the backbone chemical shifts

SPARTA uses the phi, psi and chi1 angles of a given residue to predict secondary shifts for that residue. SPARTA also includes the information from the next and previous residues when making predictions for a given residue. So, in practice, SPARTA uses data for three consecutive residues simultaneously (i.e. 9 torsion angles and 3 residue types) to make predictions for the central residue in a triplet.

The idea behind SPARTA is that if one can find some triplet of residues in a protein of known structure with similar structure and sequence to a triplet in a target protein, then the backbone secondary chemical shifts for this protein will be useful predictors for the backbone secondary chemical shifts in the target.

The similarity is measured with a score based on the weighted sum of squares differences between the torsion angles in the target protein and the database entries, so that lower scores indicated high similarity. In order to take advantage of the correlations between residue type and secondary structure, the score also includes a small, qualitative residue type term which biases the matching towards roughly similar sequences.

In practice, SPARTA searches a database for the 20 best matches to a given triplet in the target protein. The weighted averages chemical chemical shifts (obtained by subtracting their corresponding random coil chemical shifts values and the adjustments values arising from the effects of neighboring residues) of the central residues of these 20 matches are used as a prediction for the secondary shift of the central residue. The SPARTA database was constructed using the most well-defined parts of high resolution (2.4 Angstroms or better) X-ray crystal structures to define the phi, psi and chi1 angles, as well as other structural information, such as hydrogen bonding and ring current shifts, which would be used to quantitatively correct the raw predicted shifts from database searching. This database currently includes data from 200 proteins, representing 24,166 triplets.


Reliability

of SPARTA

The reliability of the SPARTA approach was tested by a cross-validation procedure where each protein was temporarily removed from the database, and its backbone chemical shifts (N, HN, HA, CA, CB and C’) were predicted using the remaining protein data. The RMS deviations between the predicted and experimental shifts are 2.36, 0.46, 0.25, 0.88, 0.97 and 1.01 ppm, respectively. The same shifts prediction accuracies are also obtained for the proteins with known structures which are not contained in the database.

Importantly, it is also found in the test that the standard deviation the shifts from the central residues of the 20 matches are correlated with the shifts prediction errors. By checking the standard deviations in the prediction summary file (pred/pred.tab) will provide an idea of the prediction reliability.

It should be noted that the global structural information, such as ring current shifts and hydrogen bonding, was also carefully considered in SPARTA. The secondary shifts in SPARTA database are actually the corrected shifts using the ring current shifts. As “compensation”, the SPARTA predicted shifts for target protein are also corrected by adding the calculated ring current shifts from target protein. For HA and HN, the predicted secondary shifts are also corrected by using the hydrogen bond length and their relationship with the prediction errors, which were derived from above cross-validation. Therefore, the accuracy of the coordinates of the target protein is critical to obtain the reliable hydrogen bond information and ring current shifts, and the final predicted shifts. The calculated hydrogen bond and ring current shifts information is stored in the input summary file (/pred/protein_in.tab).

It should also be noted that the protein backbone chemical shifts are extremely sensitive to the local conformation; therefore, SPARTA results for the residues in the flexible region or the with very large ring current shifts contribution may be less reliable, which was also indicated by the test.


Components

of the SPARTA Package

The SPARTA system is implemented using C++. The compiled executable files ($SPARTA_DIR/src/SPARTA for Linux, $SPARTA_DIR/src/SPARTA.exe for Windows) or starting script ("$SPARTA_DIR/sparta" for Linux) can be invoked with "TALOS-like" command-line argument. A complete list of options can be invoked and generated with a "-help"

command-line argument or simply typing in the executive files or starting script without any command-line arguments.

Running SPARTA requires definition of the environment variables " SPARTA_DIR "; this will be established automatically by the starting script ("$SPARTA_DIR/sparta" in Linux):

setenv SPARTA_DIR /disk1/SPARTA

$SPARTA_DIR/src/SPARTA $argv[1-$#argv]

Note that the default "$SPARTA_DIR" is the current directory if not specified.

Other files of the SPARTA package include:

$SPARTA_DIR/tab/sparta.tab The compiled database of residue triplets with their corresponding PHI/PSI/CHI1 angles and secondary shifts.

*$SPARTA_DIR/tab/randcoil.tab, rcadj.tab, rcprev.tab, rcnext.tab* The table of random coil shifts, adjustments values from neighboring residues used in the shifts prediction process. (The same tables as used in TALOS, http://spin.niddk.nih.gov/NMRPipe/talos/)
$SPARTA_DIR/tab/homology.tab The residue type homology factors used in the prediction process, which is similar to the table used by TALOS.
$SPARTA_DIR.tab/weight.tab The weighting factors of PHI, PSI and CHI1 angles, and residue type homology used in the prediction process.
$SPARTA_DIR.tab/fitting.tab The fitting parameters between prediction accuracy and precision, which will be used after the prediction process to calculate the estimated prediction error.

$SPARTA_DIR/shifts/.tab* The files in this directory are only used when compiling a new database. When compiling a new database, only shift tables ending with the ".tab" extension will be used. The files in this directory are the chemical shift tables for the proteins in the database, which are in the same format as the TALOS shifts tables and must be exactly consistent with the corresponding structures in the SPARTA pdb directory.
$SPARTA_DIR/pdb/.pdb* The PDB coordinates files in this directory are only used along with the files in the SPARTA shifts directory when compiling a new database (e.g. adding new proteins into the database). The sequence and residue numbering must be exactly consistent with the corresponding assignments in the SPARTA shifts directory. Furthermore, the names of these files must be exactly consistent with the corresponding chemical shift tables in the SPARTA shifts directory.
$SPARTA_DIR/test/* The contents of this "test" directory include the input files and results for a sample SPARTA analysis.


How to Use

SPARTA

Use of SPARTA to predict backbone chemical shifts involves the following steps:

  1. Create a directory for the prediction session; all subsequent commands will be executed from this directory.

  2. Prepare the input PDB coordinate file (for example "protein.pdb"), according to the format given above.

  3. Run SPARTA ("$SPARTA_DIR/src/SPARTA" or "$SPARTA_DIR/sparta" in Linux, "$SPARTA_DIR/src/SPARTA.exe" in Windows) to perform the database searches. Most commonly, this will simply require a command such as:

    sparta -in protein.pdb

    SPARTA will first generate an input "pred/protein_in.tab" file from PDB coordinates, which contains of the phi, psi, chi1 angles, H-bonding information and ring current shifts. During the database search, a series of files "pred/X/res*.tab" (X = N, H, HA, CA, CB and C) will be created. Each one of these files tallies the 20 best database matches for a given residue in the target protein. Before exiting, a file "pred.tab" will also be created in "pred" directory, which includes a summary of the prediction results. The database search will typically take about 25 sec for a 100-residue protein on a Linux PC with a 2.8GHz CPU.

  4. If experimental chemical shifts for target protein are available (with a name "ref.tab", for example, and the same format as typical TALOS shift table file, http://spin.niddk.nih.gov/NMRPipe/talos/), SAELDI prediction can be performed by a command such as:

    sparta -in protein.pdb -ref ref.tab

    SPARTA would compare the predicted chemical shifts and experimental shifts before exiting, and a prediction summary file "pred/pred.tab" will be generated to store the comparison between the reference and predicted shifts, as well as the errors. If the average prediction error larger than 3 times of the expected errors (standard deviation of prediction errors / square root of number of shifts), a warning is printed and a reference correction will be applied to the experimental chemical shifts. The corrected reference chemical shifts are stored into a new file "pred/ref.tab"


Preparing

the Input PDB Coordinates

The input PDB coordinates should be prepared carefully, so that it has the proper format, naming conventions. SPARTA accept the standard PDB coordinates file, but ONLY the FIRST conformer/chain if more than one exist. For PDB coordinates without hydrogen atoms, the hydrogen atoms are required to be added (by using programs DYNAMO, REDUCE, MOLMOL, or any other similar programs) in order to get the hydrogen bonding information and ring current shifts. For HA atoms of Gly, please use atom names of "HA1/HA2"

Examples of the required PDB coordinate format can be found in the "$SPARTA_DIR/pdb" and

"$SPARTA_DIR/test" directories.


Adding

New Proteins to the Database

New protein chemical shift and structure data can be added to the database. Note well that this should be done with great care and caution, to ensure that only reliable phi/psi/chi1 data with consistently referenced and correct chemical shifts are included. It suggests that

  1. The chemical shifts assignments for each candidate protein are better validated by conducting a SPARTA shift prediction using its PDB coordinates.

    sparta -in protein.pdb -ref ref.tab
  2. Check the prediction summary table (pred/pred.tab) files, remove the experimental shifts for which the predicted shifts deviated five standard deviations. Notably, HAs, for which ring current shifts are > 1.5ppm and the predicted shifts deviate the three standard deviations, are better removed.

  3. Chemical shifts shoule be referenced correctly. A quick check can be conduct by runing above SPARTA prediction for this protein and inspecting the average SPARTA prediction errors, which are listed in the header of prediction summary table (pred/pred.tab). By default, SPARTA will apply a shift referencing correction if the average prediction error is larger than 3 times expected error (i.e., standard deviation of prediction errors / square root of number of shifts), and store the corrected shifts in a file "pred/ref.tab"

Given this, the procedure for adding new proteins to the SPARTA database is simple as:

  1. Create a chemical shift table for the new protein according to the TALOS format (http://spin.niddk.nih.gov/NMRPipe/talos/). Copy the table to the "$SPARTA_DIR/shifts" directory; it must have a ".tab" extension in order to be used.

  2. Place the corresponding PDB structure file into the "$SPARTA_DIR/pdb" directory; it must have a ".pdb" extension, and its file name, sequence, and residue numbering must correspond exactly with the shift table.

  3. Prepare a table file, for example with a name of "list.tab", which only contains the names of proteins to be added into the database. This table must follow the example below:

    VARS   PDB_NAME FORMAT %24s bpti ubiquitin profilin ...

    Note that the "PDB_NAME" in the table file must consistent with the files names (with ".tab"

    and ".pdb" extension) in the SPARTA pdb and shifts directories.

  4. In the "SPARTA" directory, execute the following command to compile a new database:

    sparta -compile -pdbDir ./pdb -pdbList list.tab
  5. A new database "$SPARTA_DIR/tab/sparta.tab" will be generated from the files in SPARTA pdb and shifts directories. Please backup the old database, which will be overwritten.


Compile the

Source Code

SPARTA was implemented with standard C++ using Standard Template Library (STL). To compile the source codes (in /src directory), your system must have a compatible C++ compiler and STL library. Given this, the compiling of SPARTA executable file is simple as:

cd $SPARTA_DIR/src
make

The compiling of the SPARTA program has been tested on Windows (XP) and Linux (Linux 9 or newer). The compiled executable files ("$SPARTA_DIR/src/SPARTA" for Linux, or "$SPARTA_DIR/src/SPARTA.exe" for Windows) are contained in the distributed SPARTA package.


About the

Name SPARTA

o

In antiquity Sparta was a Dorian Greek military state, originally centered in Laconia. As a city-state devoted to military training, Sparta possessed the most formidable army in the Greek world and regarded itself as the natural protector of Greece.


_[ Home ] [ NIH ] [ NIDDK ] [ Disclaimer ] [ Copyright ]_

_last updated: Apr 2007 / Webmaster_

Revision 404 Feb 2008 - Main.DavidCowburn

 

Changed:
<
<
*SPARTA:
>
>

SPARTA: Shifts Predicted from Analogy in Residue type and Torsion Angle – NYSBC notes

Deleted:
<
<
Shifts Predicted from Analogy in Residue type and Torsion Angle – NYSBC notes*
 As described in the paper:

Protein backbone chemical shifts predicted from searching a database for torsion angle and sequence homology
Yang Shen and Ad Bax
LIBRARY:ShenBax08.pdf

Local install – dl380://infotech/spartainstallPC

cygwin session ...
Script started on Mon Feb 4 12:21:52 2008

>>Administrator@cowburn-pc #[33m/cygdrive/d/spartainstall/SPARTA
./src/sparta in test/ubiquitin.pdb

Reading PDB Coordinates from test/ubiquitin.pdb
Reading Random Coil Shifts from .\tab\randcoil.tab

Reading RC Adjustments from .\tab\rcadj.tab
Reading Previous Residue RC Adjustments from .\tab\rcprev.tab
Reading Next Residue RC Adjustments from .\tab\rcnext.tab
Reading Weighting Factors from .\tab\weight.tab
Reading Residue Homology Table from .\tab\homology.tab
Reading Fitting Parameter Table from .\tab\fitting.tab
Reading .\tab\sparta.tab, 24166 Triplets
Can't save file pred\test/ubiquitin_in.tab

Analyzing test/ubiquitin.pdb 76 residues read
Predicting ...
N HA C CA CB H
124.353 5.462 175.920 55.080 30.759 8.947 2 Q test/ubiquitin
116.472 4.213 172.450 59.570 42.210 8.342 3 I test/ubiquitin
119.243 5.693 175.320 55.210 41.480 8.871 4 F test/ubiquitin

122.133 4.870 174.870 60.621 34.230 9.693 5 V test/ubiquitin
128.653 5.367 177.140 54.519 35.050 9.096 6 K test/ubiquitin
116.533 4.970 176.909 60.470 70.630 8.925 7 T test/ubiquitin
122.463 4.310 178.800 57.580 41.970 9.037 8 L test/ubiquitin
106.723 4.428 175.520 61.400 69.140 7.386 9 T test/ubiquitin
110.023 3.978 174.070 45.460 9999.000 7.522 10 G test/ubiquitin

122.734 4.361 175.940 56.280 33.200 6.915 11 K test/ubiquitin
121.573 5.264 174.320 62.390 69.910 8.627 12 T test/ubiquitin
128.243 4.545 175.220 59.980 40.950 9.852 13 I test/ubiquitin
122.653 5.067 173.789 61.940 69.650 8.696 14 T test/ubiquitin
125.933 4.779 174.670 52.830 47.070 8.760 15 L test/ubiquitin
123.293 5.045 175.860 54.820 29.450 8.177 16 E test/ubiquitin

118.342 4.713 174.160 58.431 36.400 9.226 17 V test/ubiquitin
120.123 5.078 176.161 52.720 30.310 8.723 18 E test/ubiquitin
139.146 4.141 175.310 65.470 31.950 9999.000 19 P test/ubiquitin
104.533 4.370 174.660 57.400 63.370 7.137 20 S test/ubiquitin
124.613 4.695 176.360 55.700 40.800 8.351 21 D test/ubiquitin
109.934 5.147 176.750 59.690 71.200 7.948 22 T test/ubiquitin

122.323 3.657 179.040 62.260 34.350 8.688 23 I test/ubiquitin
121.963 3.917 178.640 60.220 28.280 9.795 24 E test/ubiquitin
121.703 4.525 178.379 56.060 38.449 7.723 25 N test/ubiquitin
122.843 3.397 177.950 67.660 30.840 7.978 26 V test/ubiquitin
119.993 4.648 180.550 59.249 33.730 8.617 27 K test/ubiquitin
124.573 4.161 180.300 55.370 17.710 7.904 28 A test/ubiquitin

121.073 4.207 180.320 59.650 33.290 7.933 29 K test/ubiquitin
122.213 3.507 178.310 66.150 36.800 8.326 30 I test/ubiquitin
124.623 3.829 178.890 60.000 27.720 8.622 31 Q test/ubiquitin
120.493 4.354 177.250 57.190 40.580 8.231 32 D test/ubiquitin
116.263 4.337 177.870 58.050 34.170 7.521 33 K test/ubiquitin
115.003 4.625 177.840 55.170 32.661 8.995 34 E test/ubiquitin

109.782 4.035 173.960 46.080 9999.000 8.741 35 G test/ubiquitin
121.013 4.446 173.590 57.750 40.580 6.297 36 I test/ubiquitin
142.438 4.634 176.940 61.660 31.850 9999.000 37 P test/ubiquitin
139.608 4.117 178.320 66.260 32.890 9999.000 38 P test/ubiquitin
114.512 4.430 177.090 55.640 39.540 8.617 39 D test/ubiquitin
117.913 4.583 175.381 55.640 30.140 7.924 40 Q test/ubiquitin

118.853 4.244 176.300 56.470 31.650 7.307 41 Q test/ubiquitin
123.813 4.499 174.050 55.050 31.750 8.520 42 R test/ubiquitin
125.173 5.353 175.290 52.980 45.790 8.867 43 L test/ubiquitin
123.053 5.216 176.060 58.980 41.420 9.487 44 I test/ubiquitin
126.523 5.045 174.470 57.020 43.760 8.869 45 F test/ubiquitin
133.333 3.690 177.289 52.540 16.570 8.897 46 A test/ubiquitin

103.473 3.791 173.810 45.350 9999.000 8.087 47 G test/ubiquitin
122.702 4.623 174.700 54.550 34.530 8.284 48 K test/ubiquitin
123.543 4.666 175.670 55.740 29.000 8.667 49 Q test/ubiquitin
126.653 4.090 176.659 54.240 41.570 8.872 50 L test/ubiquitin
124.073 4.488 175.870 55.960 31.570 8.442 51 E test/ubiquitin
121.163 4.360 177.330 56.959 40.850 8.187 52 D test/ubiquitin

107.793 4.045 174.870 45.170 9999.000 9.567 53 G test/ubiquitin
120.183 4.695 175.350 54.390 32.650 7.288 54 R test/ubiquitin
109.533 5.508 176.560 59.690 72.260 8.799 55 T test/ubiquitin
119.053 4.060 180.810 58.710 40.370 8.176 56 L test/ubiquitin
114.463 4.370 178.310 61.080 62.530 8.585 57 S test/ubiquitin
125.323 4.296 177.400 57.180 40.100 7.654 58 D test/ubiquitin

116.642 4.670 174.700 58.250 40.070 7.124 59 Y test/ubiquitin
117.033 4.355 174.341 54.120 37.410 8.329 60 N test/ubiquitin
119.733 3.393 174.610 62.420 36.740 6.970 61 I test/ubiquitin
125.874 4.506 175.970 53.660 31.650 7.647 62 Q test/ubiquitin
121.433 4.001 175.810 57.791 32.649 8.487 63 K test/ubiquitin
115.083 3.465 175.250 57.890 25.900 9.591 64 E test/ubiquitin

115.863 4.640 172.160 60.890 64.910 7.383 65 S test/ubiquitin
118.242 5.614 173.950 62.340 70.080 8.737 66 T test/ubiquitin
128.243 5.060 175.770 53.900 44.260 9.801 67 L test/ubiquitin
119.513 5.292 173.150 55.000 30.531 9.633 68 H test/ubiquitin
125.592 5.282 175.270 53.890 44.380 8.533 69 L test/ubiquitin
128.073 4.351 173.999 60.800 34.910 9.490 70 V test/ubiquitin

124.262 5.361 177.830 53.940 42.851 8.067 71 L test/ubiquitin
124.244 4.921 174.953 54.777 32.225 9.169 72 R test/ubiquitin
128.176 4.628 176.270 54.090 42.511 8.881 73 L test/ubiquitin
124.343 4.706 175.048 54.919 31.176 8.588 74 R test/ubiquitin
112.599 4.156 173.001 44.721 9999.000 8.348 75 G test/ubiquitin

Running time: 20.343 seconds
>>/cygdrive/d/spartainstall/SPARTA#

Using a standard MS shell with the infotech drive mounted ..

src\sparta -in test\ubiquitin.pdb ...

Original text --

Contact: shenyang@niddk.nih.gov; bax@nih.gov Web: http://spin.niddk.nih.gov/bax


DOWNLOAD

[[http://spin.niddk.nih.gov/bax/software/SPARTA/sparta.linux.tar.Z][RedHat Linux /Fedora Core version ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/sparta.win32.zip][Win32 version]]

The download unix archive can be unpacked with a command like the following:

   zcat sparta.linux.tar.Z | tar xvf -

The win32 archive can be unpacked with a traditional Windows zip software.

Users are encouraged to email the author to be informed about updates and related software.


[[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#what%20is%20sparta][ *What is SPARTA?* ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#reliability][ *Reliability of SPARTA* ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#components][ *Components of the SPARTA Package* ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#how%20to%20use][ *How to Use SPARTA* ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#PDB%20coordinates][ *Preparing the PDB Coordinates* ]]

[[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#add%20new%20proteins][ *Adding New Proteins to the Database* ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#compile][ *Compile the Source Code* ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#about%20name][ *About the Name SPARTA* ]]


What

is SPARTA?

SPARTA is a database system for empirical prediction of backbone chemical shifts (N, HN, HA, CA, CB, CO) using a combination of backbone phi, psi torsion angles and sidechain chi1 angles from a given protein with known PDB coordinates. The SPARTA approach is an extension of the well-known observation that many kinds of secondary chemical shifts (i.e. differences between chemical shifts and their corresponding random coil values) are highly correlated with aspects of protein secondary structure. The goal of SPARTA is to use phi, psi, chi1 torsion angles and sequence information from proteins structure in order to make quantitative predictions for the backbone chemical shifts

SPARTA uses the phi, psi and chi1 angles of a given residue to predict secondary shifts for that residue. SPARTA also includes the information from the next and previous residues when making predictions for a given residue. So, in practice, SPARTA uses data for three consecutive residues simultaneously (i.e. 9 torsion angles and 3 residue types) to make predictions for the central residue in a triplet.

The idea behind SPARTA is that if one can find some triplet of residues in a protein of known structure with similar structure and sequence to a triplet in a target protein, then the backbone secondary chemical shifts for this protein will be useful predictors for the backbone secondary chemical shifts in the target.

The similarity is measured with a score based on the weighted sum of squares differences between the torsion angles in the target protein and the database entries, so that lower scores indicated high similarity. In order to take advantage of the correlations between residue type and secondary structure, the score also includes a small, qualitative residue type term which biases the matching towards roughly similar sequences.

In practice, SPARTA searches a database for the 20 best matches to a given triplet in the target protein. The weighted averages chemical chemical shifts (obtained by subtracting their corresponding random coil chemical shifts values and the adjustments values arising from the effects of neighboring residues) of the central residues of these 20 matches are used as a prediction for the secondary shift of the central residue. The SPARTA database was constructed using the most well-defined parts of high resolution (2.4 Angstroms or better) X-ray crystal structures to define the phi, psi and chi1 angles, as well as other structural information, such as hydrogen bonding and ring current shifts, which would be used to quantitatively correct the raw predicted shifts from database searching. This database currently includes data from 200 proteins, representing 24,166 triplets.


Reliability

of SPARTA

The reliability of the SPARTA approach was tested by a cross-validation procedure where each protein was temporarily removed from the database, and its backbone chemical shifts (N, HN, HA, CA, CB and C’) were predicted using the remaining protein data. The RMS deviations between the predicted and experimental shifts are 2.36, 0.46, 0.25, 0.88, 0.97 and 1.01 ppm, respectively. The same shifts prediction accuracies are also obtained for the proteins with known structures which are not contained in the database.

Importantly, it is also found in the test that the standard deviation the shifts from the central residues of the 20 matches are correlated with the shifts prediction errors. By checking the standard deviations in the prediction summary file (pred/pred.tab) will provide an idea of the prediction reliability.

It should be noted that the global structural information, such as ring current shifts and hydrogen bonding, was also carefully considered in SPARTA. The secondary shifts in SPARTA database are actually the corrected shifts using the ring current shifts. As “compensation”, the SPARTA predicted shifts for target protein are also corrected by adding the calculated ring current shifts from target protein. For HA and HN, the predicted secondary shifts are also corrected by using the hydrogen bond length and their relationship with the prediction errors, which were derived from above cross-validation. Therefore, the accuracy of the coordinates of the target protein is critical to obtain the reliable hydrogen bond information and ring current shifts, and the final predicted shifts. The calculated hydrogen bond and ring current shifts information is stored in the input summary file (/pred/protein_in.tab).

It should also be noted that the protein backbone chemical shifts are extremely sensitive to the local conformation; therefore, SPARTA results for the residues in the flexible region or the with very large ring current shifts contribution may be less reliable, which was also indicated by the test.


Components

of the SPARTA Package

The SPARTA system is implemented using C++. The compiled executable files ($SPARTA_DIR/src/SPARTA for Linux, $SPARTA_DIR/src/SPARTA.exe for Windows) or starting script ("$SPARTA_DIR/sparta" for Linux) can be invoked with "TALOS-like" command-line argument. A complete list of options can be invoked and generated with a "-help"

command-line argument or simply typing in the executive files or starting script without any command-line arguments.

Running SPARTA requires definition of the environment variables " SPARTA_DIR "; this will be established automatically by the starting script ("$SPARTA_DIR/sparta" in Linux):

setenv SPARTA_DIR /disk1/SPARTA

$SPARTA_DIR/src/SPARTA $argv[1-$#argv]

Note that the default "$SPARTA_DIR" is the current directory if not specified.

Other files of the SPARTA package include:

$SPARTA_DIR/tab/sparta.tab The compiled database of residue triplets with their corresponding PHI/PSI/CHI1 angles and secondary shifts.

*$SPARTA_DIR/tab/randcoil.tab, rcadj.tab, rcprev.tab, rcnext.tab* The table of random coil shifts, adjustments values from neighboring residues used in the shifts prediction process. (The same tables as used in TALOS, http://spin.niddk.nih.gov/NMRPipe/talos/)
$SPARTA_DIR/tab/homology.tab The residue type homology factors used in the prediction process, which is similar to the table used by TALOS.
$SPARTA_DIR.tab/weight.tab The weighting factors of PHI, PSI and CHI1 angles, and residue type homology used in the prediction process.
$SPARTA_DIR.tab/fitting.tab The fitting parameters between prediction accuracy and precision, which will be used after the prediction process to calculate the estimated prediction error.

$SPARTA_DIR/shifts/.tab* The files in this directory are only used when compiling a new database. When compiling a new database, only shift tables ending with the ".tab" extension will be used. The files in this directory are the chemical shift tables for the proteins in the database, which are in the same format as the TALOS shifts tables and must be exactly consistent with the corresponding structures in the SPARTA pdb directory.
$SPARTA_DIR/pdb/.pdb* The PDB coordinates files in this directory are only used along with the files in the SPARTA shifts directory when compiling a new database (e.g. adding new proteins into the database). The sequence and residue numbering must be exactly consistent with the corresponding assignments in the SPARTA shifts directory. Furthermore, the names of these files must be exactly consistent with the corresponding chemical shift tables in the SPARTA shifts directory.
$SPARTA_DIR/test/* The contents of this "test" directory include the input files and results for a sample SPARTA analysis.


How to Use

SPARTA

Use of SPARTA to predict backbone chemical shifts involves the following steps:

  1. Create a directory for the prediction session; all subsequent commands will be executed from this directory.

  2. Prepare the input PDB coordinate file (for example "protein.pdb"), according to the format given above.

  3. Run SPARTA ("$SPARTA_DIR/src/SPARTA" or "$SPARTA_DIR/sparta" in Linux, "$SPARTA_DIR/src/SPARTA.exe" in Windows) to perform the database searches. Most commonly, this will simply require a command such as:

    sparta -in protein.pdb

    SPARTA will first generate an input "pred/protein_in.tab" file from PDB coordinates, which contains of the phi, psi, chi1 angles, H-bonding information and ring current shifts. During the database search, a series of files "pred/X/res*.tab" (X = N, H, HA, CA, CB and C) will be created. Each one of these files tallies the 20 best database matches for a given residue in the target protein. Before exiting, a file "pred.tab" will also be created in "pred" directory, which includes a summary of the prediction results. The database search will typically take about 25 sec for a 100-residue protein on a Linux PC with a 2.8GHz CPU.

  4. If experimental chemical shifts for target protein are available (with a name "ref.tab", for example, and the same format as typical TALOS shift table file, http://spin.niddk.nih.gov/NMRPipe/talos/), SAELDI prediction can be performed by a command such as:

    sparta -in protein.pdb -ref ref.tab

    SPARTA would compare the predicted chemical shifts and experimental shifts before exiting, and a prediction summary file "pred/pred.tab" will be generated to store the comparison between the reference and predicted shifts, as well as the errors. If the average prediction error larger than 3 times of the expected errors (standard deviation of prediction errors / square root of number of shifts), a warning is printed and a reference correction will be applied to the experimental chemical shifts. The corrected reference chemical shifts are stored into a new file "pred/ref.tab"


Preparing

the Input PDB Coordinates

The input PDB coordinates should be prepared carefully, so that it has the proper format, naming conventions. SPARTA accept the standard PDB coordinates file, but ONLY the FIRST conformer/chain if more than one exist. For PDB coordinates without hydrogen atoms, the hydrogen atoms are required to be added (by using programs DYNAMO, REDUCE, MOLMOL, or any other similar programs) in order to get the hydrogen bonding information and ring current shifts. For HA atoms of Gly, please use atom names of "HA1/HA2"

Examples of the required PDB coordinate format can be found in the "$SPARTA_DIR/pdb" and

"$SPARTA_DIR/test" directories.


Adding

New Proteins to the Database

New protein chemical shift and structure data can be added to the database. Note well that this should be done with great care and caution, to ensure that only reliable phi/psi/chi1 data with consistently referenced and correct chemical shifts are included. It suggests that

  1. The chemical shifts assignments for each candidate protein are better validated by conducting a SPARTA shift prediction using its PDB coordinates.

    sparta -in protein.pdb -ref ref.tab
  2. Check the prediction summary table (pred/pred.tab) files, remove the experimental shifts for which the predicted shifts deviated five standard deviations. Notably, HAs, for which ring current shifts are > 1.5ppm and the predicted shifts deviate the three standard deviations, are better removed.

  3. Chemical shifts shoule be referenced correctly. A quick check can be conduct by runing above SPARTA prediction for this protein and inspecting the average SPARTA prediction errors, which are listed in the header of prediction summary table (pred/pred.tab). By default, SPARTA will apply a shift referencing correction if the average prediction error is larger than 3 times expected error (i.e., standard deviation of prediction errors / square root of number of shifts), and store the corrected shifts in a file "pred/ref.tab"

Given this, the procedure for adding new proteins to the SPARTA database is simple as:

  1. Create a chemical shift table for the new protein according to the TALOS format (http://spin.niddk.nih.gov/NMRPipe/talos/). Copy the table to the "$SPARTA_DIR/shifts" directory; it must have a ".tab" extension in order to be used.

  2. Place the corresponding PDB structure file into the "$SPARTA_DIR/pdb" directory; it must have a ".pdb" extension, and its file name, sequence, and residue numbering must correspond exactly with the shift table.

  3. Prepare a table file, for example with a name of "list.tab", which only contains the names of proteins to be added into the database. This table must follow the example below:

    VARS   PDB_NAME FORMAT %24s bpti ubiquitin profilin ...

    Note that the "PDB_NAME" in the table file must consistent with the files names (with ".tab"

    and ".pdb" extension) in the SPARTA pdb and shifts directories.

  4. In the "SPARTA" directory, execute the following command to compile a new database:

    sparta -compile -pdbDir ./pdb -pdbList list.tab
  5. A new database "$SPARTA_DIR/tab/sparta.tab" will be generated from the files in SPARTA pdb and shifts directories. Please backup the old database, which will be overwritten.


Compile the

Source Code

SPARTA was implemented with standard C++ using Standard Template Library (STL). To compile the source codes (in /src directory), your system must have a compatible C++ compiler and STL library. Given this, the compiling of SPARTA executable file is simple as:

cd $SPARTA_DIR/src
make

The compiling of the SPARTA program has been tested on Windows (XP) and Linux (Linux 9 or newer). The compiled executable files ("$SPARTA_DIR/src/SPARTA" for Linux, or "$SPARTA_DIR/src/SPARTA.exe" for Windows) are contained in the distributed SPARTA package.


About the

Name SPARTA

o

In antiquity Sparta was a Dorian Greek military state, originally centered in Laconia. As a city-state devoted to military training, Sparta possessed the most formidable army in the Greek world and regarded itself as the natural protector of Greece.


_[ Home ] [ NIH ] [ NIDDK ] [ Disclaimer ] [ Copyright ]_

_last updated: Apr 2007 / Webmaster_

Revision 304 Feb 2008 - Main.DavidCowburn

Changed:
<
<
SPARTA Protein Backbone Chemical Shifts Prediction Program
>
>

Deleted:
<
<
 
Changed:
<
<

SPARTA: Shifts Predicted from Analogy in Residue type and Torsion Angle – NYSBC notes

 
Changed:
<
<

As described in the paper:
 

Protein backbone chemical
>
>
*SPARTA: Shifts Predicted from Analogy in Residue type and Torsion Angle – NYSBC notes*
Added:
>
>
As described in the paper:

Protein backbone chemical
 shifts predicted from searching a database for torsion angle and
Changed:
<
<
sequence homology
>
>
sequence homology
 
Changed:
<
<
Yang Shen and Ad Bax
>
>
Yang Shen and Ad Bax
 
Changed:
<
<
LIBRARY:ShenBax08.pdf


>
>
LIBRARY:ShenBax08.pdf
Added:
>
>
 
Changed:
<
<
Local install – dl380://infotech/spartainstallPC
>
>
Local install – dl380://infotech/spartainstallPC
 
Changed:
<
<
cygwin session ...
Script started on Mon Feb 4 12:21:52 2008


>
>
cygwin session ...
Script started on Mon Feb 4 12:21:52 2008
Added:
>
>
 
Changed:
<
<
>>Administrator@cowburn-pc #[33m/cygdrive/d/spartainstall/SPARTA
./src/sparta in test/ubiquitin.pdb
>
>
>>Administrator@cowburn-pc #[33m/cygdrive/d/spartainstall/SPARTA
./src/sparta in test/ubiquitin.pdb
 
Changed:
<
<


>
>
Added:
>
>
 
Changed:
<
<
Reading PDB Coordinates from test/ubiquitin.pdb
Reading Random Coil Shifts from .\tab\randcoil.tab
>
>
Reading PDB Coordinates from test/ubiquitin.pdb
Reading Random Coil Shifts from .\tab\randcoil.tab
 
Changed:
<
<
Reading RC Adjustments from .\tab\rcadj.tab
Reading Previous Residue RC Adjustments from .\tab\rcprev.tab
Reading Next Residue RC Adjustments from .\tab\rcnext.tab
Reading Weighting Factors from .\tab\weight.tab
Reading Residue Homology Table from .\tab\homology.tab
Reading Fitting Parameter Table from .\tab\fitting.tab
Reading .\tab\sparta.tab, 24166 Triplets
Can't save file pred\test/ubiquitin_in.tab
>
>
Reading RC Adjustments from .\tab\rcadj.tab
Reading Previous Residue RC Adjustments from .\tab\rcprev.tab
Reading Next Residue RC Adjustments from .\tab\rcnext.tab
Reading Weighting Factors from .\tab\weight.tab
Reading Residue Homology Table from .\tab\homology.tab
Reading Fitting Parameter Table from .\tab\fitting.tab
Reading .\tab\sparta.tab, 24166 Triplets
Can't save file pred\test/ubiquitin_in.tab
 
Changed:
<
<
Analyzing test/ubiquitin.pdb 76 residues read
>
>
Analyzing test/ubiquitin.pdb 76 residues read
 
Changed:
<
<
Predicting ...
N HA C CA CB H
>
>
Predicting ...
N HA C CA CB H
 
Changed:
<
<
124.353 5.462 175.920 55.080 30.759 8.947 2 Q test/ubiquitin
116.472 4.213 172.450 59.570 42.210 8.342 3 I test/ubiquitin
119.243 5.693 175.320 55.210 41.480 8.871 4 F test/ubiquitin
>
>
124.353 5.462 175.920 55.080 30.759 8.947 2 Q test/ubiquitin
116.472 4.213 172.450 59.570 42.210 8.342 3 I test/ubiquitin
119.243 5.693 175.320 55.210 41.480 8.871 4 F test/ubiquitin
 
Changed:
<
<
122.133 4.870 174.870 60.621 34.230 9.693 5 V test/ubiquitin
128.653 5.367 177.140 54.519 35.050 9.096 6 K test/ubiquitin
116.533 4.970 176.909 60.470 70.630 8.925 7 T test/ubiquitin
122.463 4.310 178.800 57.580 41.970 9.037 8 L test/ubiquitin
106.723 4.428 175.520 61.400 69.140 7.386 9 T test/ubiquitin
110.023 3.978 174.070 45.460 9999.000 7.522 10 G test/ubiquitin
>
>
122.133 4.870 174.870 60.621 34.230 9.693 5 V test/ubiquitin
128.653 5.367 177.140 54.519 35.050 9.096 6 K test/ubiquitin
116.533 4.970 176.909 60.470 70.630 8.925 7 T test/ubiquitin
122.463 4.310 178.800 57.580 41.970 9.037 8 L test/ubiquitin
106.723 4.428 175.520 61.400 69.140 7.386 9 T test/ubiquitin
110.023 3.978 174.070 45.460 9999.000 7.522 10 G test/ubiquitin
 
Changed:
<
<
122.734 4.361 175.940 56.280 33.200 6.915 11 K test/ubiquitin
121.573 5.264 174.320 62.390 69.910 8.627 12 T test/ubiquitin
128.243 4.545 175.220 59.980 40.950 9.852 13 I test/ubiquitin
122.653 5.067 173.789 61.940 69.650 8.696 14 T test/ubiquitin
125.933 4.779 174.670 52.830 47.070 8.760 15 L test/ubiquitin
123.293 5.045 175.860 54.820 29.450 8.177 16 E test/ubiquitin
>
>
122.734 4.361 175.940 56.280 33.200 6.915 11 K test/ubiquitin
121.573 5.264 174.320 62.390 69.910 8.627 12 T test/ubiquitin
128.243 4.545 175.220 59.980 40.950 9.852 13 I test/ubiquitin
122.653 5.067 173.789 61.940 69.650 8.696 14 T test/ubiquitin
125.933 4.779 174.670 52.830 47.070 8.760 15 L test/ubiquitin
123.293 5.045 175.860 54.820 29.450 8.177 16 E test/ubiquitin
 
Changed:
<
<
118.342 4.713 174.160 58.431 36.400 9.226 17 V test/ubiquitin
120.123 5.078 176.161 52.720 30.310 8.723 18 E test/ubiquitin
139.146 4.141 175.310 65.470 31.950 9999.000 19 P test/ubiquitin
104.533 4.370 174.660 57.400 63.370 7.137 20 S test/ubiquitin
124.613 4.695 176.360 55.700 40.800 8.351 21 D test/ubiquitin
109.934 5.147 176.750 59.690 71.200 7.948 22 T test/ubiquitin
>
>
118.342 4.713 174.160 58.431 36.400 9.226 17 V test/ubiquitin
120.123 5.078 176.161 52.720 30.310 8.723 18 E test/ubiquitin
139.146 4.141 175.310 65.470 31.950 9999.000 19 P test/ubiquitin
104.533 4.370 174.660 57.400 63.370 7.137 20 S test/ubiquitin
124.613 4.695 176.360 55.700 40.800 8.351 21 D test/ubiquitin
109.934 5.147 176.750 59.690 71.200 7.948 22 T test/ubiquitin
 
Changed:
<
<
122.323 3.657 179.040 62.260 34.350 8.688 23 I test/ubiquitin
121.963 3.917 178.640 60.220 28.280 9.795 24 E test/ubiquitin
121.703 4.525 178.379 56.060 38.449 7.723 25 N test/ubiquitin
122.843 3.397 177.950 67.660 30.840 7.978 26 V test/ubiquitin
119.993 4.648 180.550 59.249 33.730 8.617 27 K test/ubiquitin
124.573 4.161 180.300 55.370 17.710 7.904 28 A test/ubiquitin
>
>
122.323 3.657 179.040 62.260 34.350 8.688 23 I test/ubiquitin
121.963 3.917 178.640 60.220 28.280 9.795 24 E test/ubiquitin
121.703 4.525 178.379 56.060 38.449 7.723 25 N test/ubiquitin
122.843 3.397 177.950 67.660 30.840 7.978 26 V test/ubiquitin
119.993 4.648 180.550 59.249 33.730 8.617 27 K test/ubiquitin
124.573 4.161 180.300 55.370 17.710 7.904 28 A test/ubiquitin
 
Changed:
<
<
121.073 4.207 180.320 59.650 33.290 7.933 29 K test/ubiquitin
122.213 3.507 178.310 66.150 36.800 8.326 30 I test/ubiquitin
124.623 3.829 178.890 60.000 27.720 8.622 31 Q test/ubiquitin
120.493 4.354 177.250 57.190 40.580 8.231 32 D test/ubiquitin
116.263 4.337 177.870 58.050 34.170 7.521 33 K test/ubiquitin
115.003 4.625 177.840 55.170 32.661 8.995 34 E test/ubiquitin
>
>
121.073 4.207 180.320 59.650 33.290 7.933 29 K test/ubiquitin
122.213 3.507 178.310 66.150 36.800 8.326 30 I test/ubiquitin
124.623 3.829 178.890 60.000 27.720 8.622 31 Q test/ubiquitin
120.493 4.354 177.250 57.190 40.580 8.231 32 D test/ubiquitin
116.263 4.337 177.870 58.050 34.170 7.521 33 K test/ubiquitin
115.003 4.625 177.840 55.170 32.661 8.995 34 E test/ubiquitin
 
Changed:
<
<
109.782 4.035 173.960 46.080 9999.000 8.741 35 G test/ubiquitin
121.013 4.446 173.590 57.750 40.580 6.297 36 I test/ubiquitin
142.438 4.634 176.940 61.660 31.850 9999.000 37 P test/ubiquitin
139.608 4.117 178.320 66.260 32.890 9999.000 38 P test/ubiquitin
114.512 4.430 177.090 55.640 39.540 8.617 39 D test/ubiquitin
117.913 4.583 175.381 55.640 30.140 7.924 40 Q test/ubiquitin
>
>
109.782 4.035 173.960 46.080 9999.000 8.741 35 G test/ubiquitin
121.013 4.446 173.590 57.750 40.580 6.297 36 I test/ubiquitin
142.438 4.634 176.940 61.660 31.850 9999.000 37 P test/ubiquitin
139.608 4.117 178.320 66.260 32.890 9999.000 38 P test/ubiquitin
114.512 4.430 177.090 55.640 39.540 8.617 39 D test/ubiquitin
117.913 4.583 175.381 55.640 30.140 7.924 40 Q test/ubiquitin
 
Changed:
<
<
118.853 4.244 176.300 56.470 31.650 7.307 41 Q test/ubiquitin
123.813 4.499 174.050 55.050 31.750 8.520 42 R test/ubiquitin
125.173 5.353 175.290 52.980 45.790 8.867 43 L test/ubiquitin
123.053 5.216 176.060 58.980 41.420 9.487 44 I test/ubiquitin
126.523 5.045 174.470 57.020 43.760 8.869 45 F test/ubiquitin
133.333 3.690 177.289 52.540 16.570 8.897 46 A test/ubiquitin
>
>
118.853 4.244 176.300 56.470 31.650 7.307 41 Q test/ubiquitin
123.813 4.499 174.050 55.050 31.750 8.520 42 R test/ubiquitin
125.173 5.353 175.290 52.980 45.790 8.867 43 L test/ubiquitin
123.053 5.216 176.060 58.980 41.420 9.487 44 I test/ubiquitin
126.523 5.045 174.470 57.020 43.760 8.869 45 F test/ubiquitin
133.333 3.690 177.289 52.540 16.570 8.897 46 A test/ubiquitin
 
Changed:
<
<
103.473 3.791 173.810 45.350 9999.000 8.087 47 G test/ubiquitin
122.702 4.623 174.700 54.550 34.530 8.284 48 K test/ubiquitin
123.543 4.666 175.670 55.740 29.000 8.667 49 Q test/ubiquitin
126.653 4.090 176.659 54.240 41.570 8.872 50 L test/ubiquitin
124.073 4.488 175.870 55.960 31.570 8.442 51 E test/ubiquitin
121.163 4.360 177.330 56.959 40.850 8.187 52 D test/ubiquitin
>
>
103.473 3.791 173.810 45.350 9999.000 8.087 47 G test/ubiquitin
122.702 4.623 174.700 54.550 34.530 8.284 48 K test/ubiquitin
123.543 4.666 175.670 55.740 29.000 8.667 49 Q test/ubiquitin
126.653 4.090 176.659 54.240 41.570 8.872 50 L test/ubiquitin
124.073 4.488 175.870 55.960 31.570 8.442 51 E test/ubiquitin
121.163 4.360 177.330 56.959 40.850 8.187 52 D test/ubiquitin
 
Changed:
<
<
107.793 4.045 174.870 45.170 9999.000 9.567 53 G test/ubiquitin
120.183 4.695 175.350 54.390 32.650 7.288 54 R test/ubiquitin
109.533 5.508 176.560 59.690 72.260 8.799 55 T test/ubiquitin
119.053 4.060 180.810 58.710 40.370 8.176 56 L test/ubiquitin
114.463 4.370 178.310 61.080 62.530 8.585 57 S test/ubiquitin
125.323 4.296 177.400 57.180 40.100 7.654 58 D test/ubiquitin
>
>
107.793 4.045 174.870 45.170 9999.000 9.567 53 G test/ubiquitin
120.183 4.695 175.350 54.390 32.650 7.288 54 R test/ubiquitin
109.533 5.508 176.560 59.690 72.260 8.799 55 T test/ubiquitin
119.053 4.060 180.810 58.710 40.370 8.176 56 L test/ubiquitin
114.463 4.370 178.310 61.080 62.530 8.585 57 S test/ubiquitin
125.323 4.296 177.400 57.180 40.100 7.654 58 D test/ubiquitin
 
Changed:
<
<
116.642 4.670 174.700 58.250 40.070 7.124 59 Y test/ubiquitin
117.033 4.355 174.341 54.120 37.410 8.329 60 N test/ubiquitin
119.733 3.393 174.610 62.420 36.740 6.970 61 I test/ubiquitin
125.874 4.506 175.970 53.660 31.650 7.647 62 Q test/ubiquitin
121.433 4.001 175.810 57.791 32.649 8.487 63 K test/ubiquitin
115.083 3.465 175.250 57.890 25.900 9.591 64 E test/ubiquitin
>
>
116.642 4.670 174.700 58.250 40.070 7.124 59 Y test/ubiquitin
117.033 4.355 174.341 54.120 37.410 8.329 60 N test/ubiquitin
119.733 3.393 174.610 62.420 36.740 6.970 61 I test/ubiquitin
125.874 4.506 175.970 53.660 31.650 7.647 62 Q test/ubiquitin
121.433 4.001 175.810 57.791 32.649 8.487 63 K test/ubiquitin
115.083 3.465 175.250 57.890 25.900 9.591 64 E test/ubiquitin
 
Changed:
<
<
115.863 4.640 172.160 60.890 64.910 7.383 65 S test/ubiquitin
118.242 5.614 173.950 62.340 70.080 8.737 66 T test/ubiquitin
128.243 5.060 175.770 53.900 44.260 9.801 67 L test/ubiquitin
119.513 5.292 173.150 55.000 30.531 9.633 68 H test/ubiquitin
125.592 5.282 175.270 53.890 44.380 8.533 69 L test/ubiquitin
128.073 4.351 173.999 60.800 34.910 9.490 70 V test/ubiquitin
>
>
115.863 4.640 172.160 60.890 64.910 7.383 65 S test/ubiquitin
118.242 5.614 173.950 62.340 70.080 8.737 66 T test/ubiquitin
128.243 5.060 175.770 53.900 44.260 9.801 67 L test/ubiquitin
119.513 5.292 173.150 55.000 30.531 9.633 68 H test/ubiquitin
125.592 5.282 175.270 53.890 44.380 8.533 69 L test/ubiquitin
128.073 4.351 173.999 60.800 34.910 9.490 70 V test/ubiquitin
 
Changed:
<
<
124.262 5.361 177.830 53.940 42.851 8.067 71 L test/ubiquitin
124.244 4.921 174.953 54.777 32.225 9.169 72 R test/ubiquitin
128.176 4.628 176.270 54.090 42.511 8.881 73 L test/ubiquitin
124.343 4.706 175.048 54.919 31.176 8.588 74 R test/ubiquitin
112.599 4.156 173.001 44.721 9999.000 8.348 75 G test/ubiquitin


>
>
124.262 5.361 177.830 53.940 42.851 8.067 71 L test/ubiquitin
124.244 4.921 174.953 54.777 32.225 9.169 72 R test/ubiquitin
128.176 4.628 176.270 54.090 42.511 8.881 73 L test/ubiquitin
124.343 4.706 175.048 54.919 31.176 8.588 74 R test/ubiquitin
112.599 4.156 173.001 44.721 9999.000 8.348 75 G test/ubiquitin
Added:
>
>
 
Changed:
<
<
Running time: 20.343 seconds
>>/cygdrive/d/spartainstall/SPARTA#


>
>
Running time: 20.343 seconds
>>/cygdrive/d/spartainstall/SPARTA#
Added:
>
>
 
Changed:
<
<
Using a standard MS shell with the infotech drive mounted ..


>
>
Using a standard MS shell with the infotech drive mounted ..
Added:
>
>
 
Changed:
<
<
src\sparta -in test\ubiquitin.pdb ...


>
>
src\sparta -in test\ubiquitin.pdb ...
 
Changed:
<
<
Original text --
 
>
>
Original text --
Deleted:
<
<

Contact:       shenyang@niddk.nih.gov; bax@nih.gov
Web:       http://spin.niddk.nih.gov/bax


DOWNLOAD

 
Changed:
<
<

RedHat Linux /Fedora Core version
Win32 version

The download unix archive can be unpacked with a command like the following:

   zcat sparta.linux.tar.Z | tar xvf -

>
>
Contact: shenyang@niddk.nih.gov; bax@nih.gov Web: http://spin.niddk.nih.gov/bax

DOWNLOAD

Deleted:
<
<
The win32 archive can be unpacked with a traditional Windows zip software.

Users are encouraged to email the author to be informed about updates and related software.


What is SPARTA?
Reliability of SPARTA
Components of the SPARTA Package
How to Use SPARTA
Preparing the PDB Coordinates

 
Changed:
<
<

Adding New Proteins to the Database
Compile the Source Code
About the Name SPARTA


What is SPARTA?

SPARTA is a database system for

>
>
[[http://spin.niddk.nih.gov/bax/software/SPARTA/sparta.linux.tar.Z][RedHat Linux /Fedora Core version ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/sparta.win32.zip][Win32 version]]

The download unix archive can be unpacked with a command like the following:

   zcat sparta.linux.tar.Z | tar xvf -

Added:
>
>

The win32 archive can be unpacked with a traditional Windows zip software.

Users are encouraged to email the author to be informed about updates and related software.


[[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#what%20is%20sparta][ *What is SPARTA?* ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#reliability][ *Reliability of SPARTA* ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#components][ *Components of the SPARTA Package* ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#how%20to%20use][ *How to Use SPARTA* ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#PDB%20coordinates][ *Preparing the PDB Coordinates* ]]

[[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#add%20new%20proteins][ *Adding New Proteins to the Database* ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#compile][ *Compile the Source Code* ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#about%20name][ *About the Name SPARTA* ]]


What

is SPARTA?

SPARTA is a database system for

 empirical prediction of backbone chemical shifts (N, HN, HA, CA, CB, CO) using a combination of backbone phi, psi torsion angles and sidechain chi1 angles from a given protein with known PDB coordinates. The SPARTA approach is an extension of the well-known observation that many kinds of secondary chemical shifts (i.e. differences between chemical shifts and their corresponding random coil values) are highly correlated with aspects of protein secondary structure. The goal of SPARTA is to use phi, psi, chi1 torsion angles and sequence information from proteins structure in order to make
Changed:
<
<
quantitative predictions for the backbone chemical shifts

SPARTA uses the phi, psi and chi1

>
>
quantitative predictions for the backbone chemical shifts

SPARTA uses the phi, psi and chi1

 angles of a given residue to predict secondary shifts for that residue. SPARTA also includes the information from the next and previous residues when making predictions for a given residue. So, in practice, SPARTA uses data for three consecutive residues simultaneously (i.e. 9 torsion angles and 3 residue types) to make
Changed:
<
<
predictions for the central residue in a triplet.

The idea behind SPARTA is that if

>
>
predictions for the central residue in a triplet.

The idea behind SPARTA is that if

 one can find some triplet of residues in a protein of known structure with similar structure and sequence to a triplet in a target protein, then the backbone secondary chemical shifts for this protein will be useful predictors for the backbone secondary chemical shifts in the
Changed:
<
<
target.
>
>
target.
 
Changed:
<
<

>
>
The similarity is measured with a
Deleted:
<
<

The similarity is measured with a

 score based on the weighted sum of squares differences between the torsion angles in the target protein and the database entries, so that lower scores indicated high similarity. In order to take advantage of the correlations between residue type and secondary structure, the score also includes a small, qualitative residue type
Changed:
<
<
term which biases the matching towards roughly similar sequences.

In practice, SPARTA searches a

>
>
term which biases the matching towards roughly similar sequences.

In practice, SPARTA searches a

 database for the 20 best matches to a given triplet in the target protein. The weighted averages chemical chemical shifts (obtained by subtracting their corresponding random coil chemical shifts values and the adjustments values arising from the effects of neighboring residues) of the central residues of these 20 matches are used as a prediction for the secondary shift of the central residue. The SPARTA database was constructed using the most well-defined parts of high resolution (2.4 Angstroms or better) X-ray crystal structures to define the phi, psi and chi1 angles, as well as other structural information, such as hydrogen bonding and ring current shifts, which would be used to quantitatively correct the raw predicted shifts from database searching. This database currently includes data from 200
Changed:
<
<
proteins, representing 24,166 triplets.


Reliability of SPARTA

The reliability of the SPARTA

>
>
proteins, representing 24,166 triplets.

Reliability

of SPARTA

The reliability of the SPARTA

 approach was tested by a cross-validation procedure where each protein was temporarily removed from the database, and its backbone
Changed:
<
<
chemical shifts (N, HN, HA, CA, CB and C’) were predicted using
>
>
chemical shifts (N, HN, HA, CA, CB and C’) were predicted using
 the remaining protein data. The RMS deviations between the predicted and experimental shifts are 2.36, 0.46, 0.25, 0.88, 0.97 and 1.01 ppm, respectively. The same shifts prediction accuracies are also obtained for the proteins with known structures which are not
Changed:
<
<
contained in the database.

Importantly, it is also found in the

>
>
contained in the database.

Importantly, it is also found in the

 test that the standard deviation the shifts from the central residues of the 20 matches are correlated with the shifts prediction errors. By checking the standard deviations in the prediction summary file
Changed:
<
<
(pred/pred.tab) will provide an idea of the prediction reliability.
>
>
(pred/pred.tab) will provide an idea of the prediction reliability.
Deleted:
<
<

 
Changed:
<
<

It should be noted that the global

>
>
It should be noted that the global
 structural information, such as ring current shifts and hydrogen bonding, was also carefully considered in SPARTA. The secondary shifts in SPARTA database are actually the corrected shifts using the
Changed:
<
<
ring current shifts. As “compensation”, the SPARTA
>
>
ring current shifts. As “compensation”, the SPARTA
 predicted shifts for target protein are also corrected by adding the calculated ring current shifts from target protein. For HA and HN, the predicted secondary shifts are also corrected by using the hydrogen bond length and their relationship with the prediction errors, which were derived from above cross-validation. Therefore, the accuracy of the coordinates of the target protein is critical to obtain the reliable hydrogen bond information and ring current shifts, and the final predicted shifts. The calculated hydrogen bond and ring current shifts information is stored in the input summary
Changed:
<
<
file (/pred/protein_in.tab).

It should also be noted that the

>
>
file (/pred/protein_in.tab).

It should also be noted that the

 protein backbone chemical shifts are extremely sensitive to the local conformation; therefore, SPARTA results for the residues in the flexible region or the with very large ring current shifts contribution may be less reliable, which was also indicated by the
Changed:
<
<
test.


Components of the SPARTA Package

The SPARTA system is implemented

>
>
test.

Components

of the SPARTA Package

The SPARTA system is implemented

 using C++. The compiled executable files ($SPARTA_DIR/src/SPARTA for Linux, $SPARTA_DIR/src/SPARTA.exe for Windows) or starting script
Changed:
<
<
("$SPARTA_DIR/sparta" for Linux) can be invoked with "TALOS-like" command-line argument. A complete list of options can be invoked and generated with a "-help"
>
>
("$SPARTA_DIR/sparta" for Linux) can be invoked with "TALOS-like" command-line argument. A complete list of options can be invoked and generated with a "-help"
 command-line argument or simply typing in the executive files or
Changed:
<
<
starting script without any command-line arguments.

Running SPARTA requires definition of the environment variables "SPARTA_DIR";

>
>
starting script without any command-line arguments.

Running SPARTA requires definition of the environment variables " SPARTA_DIR ";

 this will be established automatically by the starting script
Changed:
<
<
("$SPARTA_DIR/sparta" in Linux):

setenv SPARTA_DIR /disk1/SPARTA

>
>
("$SPARTA_DIR/sparta" in Linux):
setenv SPARTA_DIR /disk1/SPARTA

Added:
>
>

$SPARTA_DIR/src/SPARTA $argv[1-$#argv]

 
Changed:
<
<
$SPARTA_DIR/src/SPARTA $argv[1-$#argv]

Note that the default "$SPARTA_DIR" is the current directory if not specified.

Other files of the SPARTA package include:

>
>
Note that the default "$SPARTA_DIR" is the current directory if not specified.

Other files of the SPARTA package include:

$SPARTA_DIR/tab/sparta.tab The
Deleted:
<
<
$SPARTA_DIR/tab/sparta.tab
The
 compiled database of residue triplets with their corresponding
Changed:
<
<
PHI/PSI/CHI1 angles and secondary shifts.
>
>
PHI/PSI/CHI1 angles and secondary shifts.
 
Changed:
<
<
$SPARTA_DIR/tab/randcoil.tab, rcadj.tab, rcprev.tab, rcnext.tab
The
>
>
*$SPARTA_DIR/tab/randcoil.tab, rcadj.tab, rcprev.tab, rcnext.tab*
Added:
>
>
The
 table of random coil shifts, adjustments values from neighboring residues used in the shifts prediction process. (The same tables as
Changed:
<
<
used in TALOS, http://spin.niddk.nih.gov/NMRPipe/talos/)
>
>
used in TALOS, http://spin.niddk.nih.gov/NMRPipe/talos/)
Deleted:
<
<
 
Changed:
<
<
$SPARTA_DIR/tab/homology.tab
The
>
>
$SPARTA_DIR/tab/homology.tab
Added:
>
>
The
 residue type homology factors used in the prediction process, which
Changed:
<
<
is similar to the table used by TALOS.
>
>
is similar to the table used by TALOS.
 
Changed:
<
<
$SPARTA_DIR.tab/weight.tab
The
>
>
$SPARTA_DIR.tab/weight.tab
Added:
>
>
The
 weighting factors of PHI, PSI and CHI1 angles, and residue type
Changed:
<
<
homology used in the prediction process.
>
>
homology used in the prediction process.
 
Changed:
<
<
$SPARTA_DIR.tab/fitting.tab
The
>
>
$SPARTA_DIR.tab/fitting.tab
Added:
>
>
The
 fitting parameters between prediction accuracy and precision, which will be used after the prediction process to calculate the estimated
Changed:
<
<
prediction error.
>
>
prediction error.
 
Changed:
<
<
$SPARTA_DIR/shifts/*.tab
The
>
>
$SPARTA_DIR/shifts/.tab*
Added:
>
>
The
 files in this directory are only used when compiling a new database. When compiling a new database, only shift tables ending with the
Changed:
<
<
".tab" extension will be used. The files in this directory
>
>
".tab" extension will be used. The files in this directory
 are the chemical shift tables for the proteins in the database, which are in the same format as the TALOS shifts tables and must be exactly consistent with the corresponding structures in the SPARTA pdb
Changed:
<
<
directory.
>
>
directory.
 
Changed:
<
<
$SPARTA_DIR/pdb/*.pdb
The
>
>
$SPARTA_DIR/pdb/.pdb*
Added:
>
>
The
 PDB coordinates files in this directory are only used along with the files in the SPARTA shifts directory when compiling a new database (e.g. adding new proteins into the database). The sequence and residue numbering must be exactly consistent with the corresponding assignments in the SPARTA shifts directory. Furthermore, the names of these files must be exactly consistent with the corresponding
Changed:
<
<
chemical shift tables in the SPARTA shifts directory.
>
>
chemical shift tables in the SPARTA shifts directory.
 
Changed:
<
<
$SPARTA_DIR/test/*
The contents of this "test" directory include the input files and results for a sample SPARTA analysis.
>
>
$SPARTA_DIR/test/* The contents of this "test" directory include the input files
Added:
>
>
and results for a sample SPARTA analysis.
 
Changed:
<
<

How to Use SPARTA

>
>

How to Use

SPARTA
Deleted:
<
<

Use of SPARTA to predict backbone chemical shifts involves the following steps:

  1. Create a directory for the prediction session; all subsequent commands will be executed from this directory.

  2. Prepare the input PDB coordinate file (for example "protein.pdb"), according to the format given above.

 
Changed:
<
<

  • Run SPARTA ("$SPARTA_DIR/src/SPARTA" or "$SPARTA_DIR/sparta" in Linux, "$SPARTA_DIR/src/SPARTA.exe" in Windows) to perform the database searches. Most commonly, this will simply require a command such as:

  • >
    >
    Use of SPARTA to predict backbone chemical shifts involves the following steps:
    1. Create a directory for the prediction session; all subsequent commands will be executed from this directory.

    2. Prepare the input PDB coordinate file (for example "protein.pdb"), according to the format given above.

    3. Run SPARTA ("$SPARTA_DIR/src/SPARTA" or "$SPARTA_DIR/sparta" in Linux, "$SPARTA_DIR/src/SPARTA.exe" in Windows) to perform the database searches. Most commonly, this will simply require a command such as:

      sparta -in protein.pdb

      SPARTA will first generate an input "pred/protein_in.tab" file from PDB coordinates, which contains of the phi, psi, chi1 angles, H-bonding information and ring current shifts. During the database search, a series of files "pred/X/res*.tab" (X = N, H, HA, CA, CB and C) will be created. Each one of these files tallies the 20 best database matches for a given residue in the target protein. Before exiting, a file "pred.tab" will also be created in "pred" directory, which includes a summary of the prediction results. The database search will typically take about 25 sec for a 100-residue protein on a Linux PC with a 2.8GHz CPU.

    4. If experimental chemical shifts for target protein are available (with a name "ref.tab", for example, and the same format as typical TALOS shift table file, http://spin.niddk.nih.gov/NMRPipe/talos/), SAELDI prediction can be performed by a command such as:

      sparta -in protein.pdb -ref ref.tab

      SPARTA would compare the predicted chemical shifts and experimental shifts before exiting, and a prediction summary file "pred/pred.tab" will be generated to store the comparison between the reference and predicted shifts, as well as the errors. If the average prediction error larger than 3 times of the expected errors (standard deviation of prediction errors / square root of number of shifts), a warning is printed and a reference correction will be applied to the experimental chemical shifts. The corrected reference chemical shifts are stored into a new file "pred/ref.tab"


    Added:
    >
    >

    Preparing

    the Input PDB Coordinates
     
    Changed:
    <
    <
    sparta -in protein.pdb

    >
    >
    The input PDB coordinates should be
    Deleted:
    <
    <
    SPARTA will first generate an input "pred/protein_in.tab" file from PDB coordinates, which contains of the phi, psi, chi1 angles, H-bonding information and ring current shifts. During the database search, a series of files "pred/X/res*.tab" (X = N, H, HA, CA, CB and C) will be created. Each one of these files tallies the 20 best database matches for a given residue in the target protein. Before exiting, a file "pred.tab" will also be created in "pred" directory, which includes a summary of the prediction results. The database search will typically take about 25 sec for a 100-residue protein on a Linux PC with a 2.8GHz CPU.

  • If experimental chemical shifts for target protein are available (with a name "ref.tab", for example, and the same format as typical TALOS shift table file, http://spin.niddk.nih.gov/NMRPipe/talos/), SAELDI prediction can be performed by a command such as:

    sparta -in protein.pdb -ref ref.tab

    SPARTA would compare the predicted chemical shifts and experimental shifts before exiting, and a prediction summary file "pred/pred.tab" will be generated to store the comparison between the reference and predicted shifts, as well as the errors. If the average prediction error larger than 3 times of the expected errors (standard deviation of prediction errors / square root of number of shifts), a warning is printed and a reference correction will be applied to the experimental chemical shifts. The corrected reference chemical shifts are stored into a new file "pred/ref.tab"


    Preparing the Input PDB Coordinates

    The input PDB coordinates should be

  •  prepared carefully, so that it has the proper format, naming conventions. SPARTA accept the standard PDB coordinates file, but
    Changed:
    <
    <
    ONLY the FIRST conformer/chain if more than one exist. For PDB
    >
    >
    ONLY the FIRST conformer/chain if more than one exist. For PDB
     coordinates without hydrogen atoms, the hydrogen atoms are required to be added (by using programs DYNAMO, REDUCE, MOLMOL, or any other similar programs) in order to get the hydrogen bonding information and ring current shifts. For HA atoms of Gly, please use atom names
    Changed:
    <
    <
    of "HA1/HA2"
    >
    >
    of "HA1/HA2"
    Deleted:
    <
    <

    Examples of the required PDB coordinate format can be found in the "$SPARTA_DIR/pdb" and

     
    Changed:
    <
    <
    "$SPARTA_DIR/test" directories.


    Adding New Proteins to the Database

    New protein chemical shift and

    >
    >
    Examples of the required PDB coordinate format can be found in the "$SPARTA_DIR/pdb" and

    "$SPARTA_DIR/test" directories.


    Adding

    Added:
    >
    >
    New Proteins to the Database

    New protein chemical shift and

     structure data can be added to the database. Note well that this should be done with great care and caution, to ensure that only reliable phi/psi/chi1 data with consistently referenced and correct
    Changed:
    <
    <
    chemical shifts are included. It suggests that

    1. The chemical shifts assignments

    >
    >
    chemical shifts are included. It suggests that
    1. The chemical shifts assignments for each candidate protein are better validated by conducting a SPARTA shift prediction using its PDB coordinates.

      sparta -in protein.pdb -ref ref.tab
    2. Check the prediction summary table (pred/pred.tab) files, remove the experimental shifts for which the predicted shifts deviated five standard deviations. Notably, HAs, for which ring current shifts are > 1.5ppm and the predicted shifts deviate the three standard deviations, are better removed.

    3. Chemical shifts shoule be referenced correctly. A quick check can be conduct by runing above SPARTA prediction for this protein and inspecting the average SPARTA prediction errors, which are listed in the header of prediction summary table (pred/pred.tab). By default, SPARTA will apply a shift referencing correction if the average prediction error is larger than 3 times expected error (i.e., standard deviation of prediction errors / square root of number of shifts), and store the corrected shifts in a file "pred/ref.tab"

    Deleted:
    <
    <
    for each candidate protein are better validated by conducting a SPARTA shift prediction using its PDB coordinates.

     
    Changed:
    <
    <
    sparta -in protein.pdb -ref ref.tab
  • Check the prediction summary table (pred/pred.tab) files, remove the experimental shifts for which the predicted shifts deviated five standard deviations. Notably, HAs, for which ring current shifts are > 1.5ppm and the predicted shifts deviate the three standard deviations, are better removed.

  • Chemical shifts shoule be referenced correctly. A quick check can be conduct by runing above

  • >
    >
    Given this, the procedure for adding new proteins to the SPARTA database is simple as:
    1. Create a chemical shift table for the new protein according to the TALOS format (http://spin.niddk.nih.gov/NMRPipe/talos/). Copy the table to the "$SPARTA_DIR/shifts" directory; it must have a ".tab" extension in order to be used.

    2. Place the corresponding PDB structure file into the "$SPARTA_DIR/pdb" directory; it must have a ".pdb" extension, and its file name, sequence, and residue numbering must correspond exactly with the shift table.

    3. Prepare a table file, for example with a name of "list.tab", which only contains the names of proteins to be added into the database. This table must follow the example below:

      VARS   PDB_NAME FORMAT %24s bpti ubiquitin profilin ...

      Note that the "PDB_NAME" in the table file must consistent with the files names (with ".tab"

      and ".pdb" extension) in the SPARTA pdb and shifts directories.

    4. In the "SPARTA" directory, execute the following command to compile a new database:

      sparta -compile -pdbDir ./pdb -pdbList list.tab
    5. A new database "$SPARTA_DIR/tab/sparta.tab" will be generated from the files in SPARTA pdb and shifts directories. Please backup the old database, which will be overwritten.


    Compile the

    Source Code
    Deleted:
    <
    <
    SPARTA prediction for this protein and inspecting the average SPARTA prediction errors, which are listed in the header of prediction summary table (pred/pred.tab). By default, SPARTA will apply a shift referencing correction if the average prediction error is larger than 3 times expected error (i.e., standard deviation of prediction errors / square root of number of shifts), and store the corrected shifts in a file "pred/ref.tab"

     
    Changed:
    <
    <
    >
    >
    SPARTA was implemented with standard
    Deleted:
    <
    <

    Given this, the procedure for adding new proteins to the SPARTA database is simple as:

    1. Create a chemical shift table for the new protein according to the TALOS format (http://spin.niddk.nih.gov/NMRPipe/talos/). Copy the table to the "$SPARTA_DIR/shifts" directory; it must have a ".tab" extension in order to be used.

    2. Place the corresponding PDB structure file into the "$SPARTA_DIR/pdb" directory; it must have a ".pdb" extension, and its file name, sequence, and residue numbering must correspond exactly with the shift table.

    3. Prepare a table file, for example with a name of "list.tab", which only contains the names of proteins to be added into the database. This table must follow the example below:

      VARS   PDB_NAME
      FORMAT %24s
      bpti
      ubiquitin
      profilin
      ...

      Note that the "PDB_NAME" in the table file must consistent with the files names (with ".tab"

      and ".pdb" extension) in the SPARTA pdb and shifts directories.

    4. In the "SPARTA" directory, execute the following command to compile a new database:

      sparta -compile -pdbDir ./pdb -pdbList list.tab
    5. A new database "$SPARTA_DIR/tab/sparta.tab" will be generated from the files in SPARTA pdb and shifts directories. Please backup the old database, which will be overwritten.


    Compile the Source Code

    SPARTA was implemented with standard

     C++ using Standard Template Library (STL). To compile the source codes (in /src directory), your system must have a compatible C++ compiler and STL library. Given this, the compiling of SPARTA
    Changed:
    <
    <
    executable file is simple as:
    >
    >
    executable file is simple as:
     
    Changed:
    <
    <

    cd $SPARTA_DIR/src
    make

    The compiling of the SPARTA program has

    >
    >
    cd $SPARTA_DIR/src
    make
    
    Added:
    >
    >
    The compiling of the SPARTA program has
     been tested on Windows (XP) and Linux (Linux 9 or newer). The
    Changed:
    <
    <
    compiled executable files ("$SPARTA_DIR/src/SPARTA" for Linux, or "$SPARTA_DIR/src/SPARTA.exe" for Windows) are contained in the distributed SPARTA package.


    About the

    >
    >
    compiled executable files ("$SPARTA_DIR/src/SPARTA" for Linux, or "$SPARTA_DIR/src/SPARTA.exe" for Windows) are contained in the distributed SPARTA package.

    About the

    Name SPARTA
    Deleted:
    <
    <
    Name SPARTA
     
    Changed:
    <
    <

    o


    In antiquity Sparta was a Dorian

    >
    >
    o

    In antiquity Sparta was a Dorian

     Greek military state, originally centered in Laconia. As a city-state devoted to military training, Sparta possessed the most formidable army in the Greek world and regarded itself as the natural protector
    Changed:
    <
    <
    of Greece.

    >
    >
    of Greece.
    Deleted:
    <
    <


    [ Home ] [ NIH ] [ NIDDK ] [ Disclaimer ] [ Copyright ]

     
    Changed:
    <
    <

    last updated:  Apr 2007 / Webmaster

    >
    >
    _[ Home ] [ NIH ] [ NIDDK ] [ Disclaimer ] [ Copyright ]_
    Added:
    >
    >
    _last updated: Apr 2007 / Webmaster_
     

    Revision 204 Feb 2008 - Main.DavidCowburn

    Added:
    >
    >
    SPARTA Protein Backbone Chemical Shifts Prediction Program

    SPARTA: Shifts Predicted from Analogy in Residue type and Torsion Angle – NYSBC notes

    As described in the paper:
     

    Protein backbone chemical shifts predicted from searching a database for torsion angle and sequence homology
    Yang Shen and Ad Bax
    LIBRARY:ShenBax08.pdf


    Local install – dl380://infotech/spartainstallPC

    cygwin session ...
    Script started on Mon Feb 4 12:21:52 2008


    >>Administrator@cowburn-pc #[33m/cygdrive/d/spartainstall/SPARTA
    ./src/sparta in test/ubiquitin.pdb


    Reading PDB Coordinates from test/ubiquitin.pdb
    Reading Random Coil Shifts from .\tab\randcoil.tab

    Reading RC Adjustments from .\tab\rcadj.tab
    Reading Previous Residue RC Adjustments from .\tab\rcprev.tab
    Reading Next Residue RC Adjustments from .\tab\rcnext.tab
    Reading Weighting Factors from .\tab\weight.tab
    Reading Residue Homology Table from .\tab\homology.tab
    Reading Fitting Parameter Table from .\tab\fitting.tab
    Reading .\tab\sparta.tab, 24166 Triplets
    Can't save file pred\test/ubiquitin_in.tab

    Analyzing test/ubiquitin.pdb 76 residues read
    Predicting ...
    N HA C CA CB H
    124.353 5.462 175.920 55.080 30.759 8.947 2 Q test/ubiquitin
    116.472 4.213 172.450 59.570 42.210 8.342 3 I test/ubiquitin
    119.243 5.693 175.320 55.210 41.480 8.871 4 F test/ubiquitin

    122.133 4.870 174.870 60.621 34.230 9.693 5 V test/ubiquitin
    128.653 5.367 177.140 54.519 35.050 9.096 6 K test/ubiquitin
    116.533 4.970 176.909 60.470 70.630 8.925 7 T test/ubiquitin
    122.463 4.310 178.800 57.580 41.970 9.037 8 L test/ubiquitin
    106.723 4.428 175.520 61.400 69.140 7.386 9 T test/ubiquitin
    110.023 3.978 174.070 45.460 9999.000 7.522 10 G test/ubiquitin

    122.734 4.361 175.940 56.280 33.200 6.915 11 K test/ubiquitin
    121.573 5.264 174.320 62.390 69.910 8.627 12 T test/ubiquitin
    128.243 4.545 175.220 59.980 40.950 9.852 13 I test/ubiquitin
    122.653 5.067 173.789 61.940 69.650 8.696 14 T test/ubiquitin
    125.933 4.779 174.670 52.830 47.070 8.760 15 L test/ubiquitin
    123.293 5.045 175.860 54.820 29.450 8.177 16 E test/ubiquitin

    118.342 4.713 174.160 58.431 36.400 9.226 17 V test/ubiquitin
    120.123 5.078 176.161 52.720 30.310 8.723 18 E test/ubiquitin
    139.146 4.141 175.310 65.470 31.950 9999.000 19 P test/ubiquitin
    104.533 4.370 174.660 57.400 63.370 7.137 20 S test/ubiquitin
    124.613 4.695 176.360 55.700 40.800 8.351 21 D test/ubiquitin
    109.934 5.147 176.750 59.690 71.200 7.948 22 T test/ubiquitin

    122.323 3.657 179.040 62.260 34.350 8.688 23 I test/ubiquitin
    121.963 3.917 178.640 60.220 28.280 9.795 24 E test/ubiquitin
    121.703 4.525 178.379 56.060 38.449 7.723 25 N test/ubiquitin
    122.843 3.397 177.950 67.660 30.840 7.978 26 V test/ubiquitin
    119.993 4.648 180.550 59.249 33.730 8.617 27 K test/ubiquitin
    124.573 4.161 180.300 55.370 17.710 7.904 28 A test/ubiquitin

    121.073 4.207 180.320 59.650 33.290 7.933 29 K test/ubiquitin
    122.213 3.507 178.310 66.150 36.800 8.326 30 I test/ubiquitin
    124.623 3.829 178.890 60.000 27.720 8.622 31 Q test/ubiquitin
    120.493 4.354 177.250 57.190 40.580 8.231 32 D test/ubiquitin
    116.263 4.337 177.870 58.050 34.170 7.521 33 K test/ubiquitin
    115.003 4.625 177.840 55.170 32.661 8.995 34 E test/ubiquitin

    109.782 4.035 173.960 46.080 9999.000 8.741 35 G test/ubiquitin
    121.013 4.446 173.590 57.750 40.580 6.297 36 I test/ubiquitin
    142.438 4.634 176.940 61.660 31.850 9999.000 37 P test/ubiquitin
    139.608 4.117 178.320 66.260 32.890 9999.000 38 P test/ubiquitin
    114.512 4.430 177.090 55.640 39.540 8.617 39 D test/ubiquitin
    117.913 4.583 175.381 55.640 30.140 7.924 40 Q test/ubiquitin

    118.853 4.244 176.300 56.470 31.650 7.307 41 Q test/ubiquitin
    123.813 4.499 174.050 55.050 31.750 8.520 42 R test/ubiquitin
    125.173 5.353 175.290 52.980 45.790 8.867 43 L test/ubiquitin
    123.053 5.216 176.060 58.980 41.420 9.487 44 I test/ubiquitin
    126.523 5.045 174.470 57.020 43.760 8.869 45 F test/ubiquitin
    133.333 3.690 177.289 52.540 16.570 8.897 46 A test/ubiquitin

    103.473 3.791 173.810 45.350 9999.000 8.087 47 G test/ubiquitin
    122.702 4.623 174.700 54.550 34.530 8.284 48 K test/ubiquitin
    123.543 4.666 175.670 55.740 29.000 8.667 49 Q test/ubiquitin
    126.653 4.090 176.659 54.240 41.570 8.872 50 L test/ubiquitin
    124.073 4.488 175.870 55.960 31.570 8.442 51 E test/ubiquitin
    121.163 4.360 177.330 56.959 40.850 8.187 52 D test/ubiquitin

    107.793 4.045 174.870 45.170 9999.000 9.567 53 G test/ubiquitin
    120.183 4.695 175.350 54.390 32.650 7.288 54 R test/ubiquitin
    109.533 5.508 176.560 59.690 72.260 8.799 55 T test/ubiquitin
    119.053 4.060 180.810 58.710 40.370 8.176 56 L test/ubiquitin
    114.463 4.370 178.310 61.080 62.530 8.585 57 S test/ubiquitin
    125.323 4.296 177.400 57.180 40.100 7.654 58 D test/ubiquitin

    116.642 4.670 174.700 58.250 40.070 7.124 59 Y test/ubiquitin
    117.033 4.355 174.341 54.120 37.410 8.329 60 N test/ubiquitin
    119.733 3.393 174.610 62.420 36.740 6.970 61 I test/ubiquitin
    125.874 4.506 175.970 53.660 31.650 7.647 62 Q test/ubiquitin
    121.433 4.001 175.810 57.791 32.649 8.487 63 K test/ubiquitin
    115.083 3.465 175.250 57.890 25.900 9.591 64 E test/ubiquitin

    115.863 4.640 172.160 60.890 64.910 7.383 65 S test/ubiquitin
    118.242 5.614 173.950 62.340 70.080 8.737 66 T test/ubiquitin
    128.243 5.060 175.770 53.900 44.260 9.801 67 L test/ubiquitin
    119.513 5.292 173.150 55.000 30.531 9.633 68 H test/ubiquitin
    125.592 5.282 175.270 53.890 44.380 8.533 69 L test/ubiquitin
    128.073 4.351 173.999 60.800 34.910 9.490 70 V test/ubiquitin

    124.262 5.361 177.830 53.940 42.851 8.067 71 L test/ubiquitin
    124.244 4.921 174.953 54.777 32.225 9.169 72 R test/ubiquitin
    128.176 4.628 176.270 54.090 42.511 8.881 73 L test/ubiquitin
    124.343 4.706 175.048 54.919 31.176 8.588 74 R test/ubiquitin
    112.599 4.156 173.001 44.721 9999.000 8.348 75 G test/ubiquitin


    Running time: 20.343 seconds
    >>/cygdrive/d/spartainstall/SPARTA#


    Using a standard MS shell with the infotech drive mounted ..


    src\sparta -in test\ubiquitin.pdb ...


    Original text --
     

    Contact:       shenyang@niddk.nih.gov; bax@nih.gov
    Web:       http://spin.niddk.nih.gov/bax


    DOWNLOAD

    RedHat Linux /Fedora Core version
    Win32 version

    The download unix archive can be unpacked with a command like the following:

       zcat sparta.linux.tar.Z | tar xvf -

    The win32 archive can be unpacked with a traditional Windows zip software.

    Users are encouraged to email the author to be informed about updates and related software.


    What is SPARTA?
    Reliability of SPARTA
    Components of the SPARTA Package
    How to Use SPARTA
    Preparing the PDB Coordinates


    Adding New Proteins to the Database
    Compile the Source Code
    About the Name SPARTA


    What is SPARTA?

    SPARTA is a database system for empirical prediction of backbone chemical shifts (N, HN, HA, CA, CB, CO) using a combination of backbone phi, psi torsion angles and sidechain chi1 angles from a given protein with known PDB coordinates. The SPARTA approach is an extension of the well-known observation that many kinds of secondary chemical shifts (i.e. differences between chemical shifts and their corresponding random coil values) are highly correlated with aspects of protein secondary structure. The goal of SPARTA is to use phi, psi, chi1 torsion angles and sequence information from proteins structure in order to make quantitative predictions for the backbone chemical shifts

    SPARTA uses the phi, psi and chi1 angles of a given residue to predict secondary shifts for that residue. SPARTA also includes the information from the next and previous residues when making predictions for a given residue. So, in practice, SPARTA uses data for three consecutive residues simultaneously (i.e. 9 torsion angles and 3 residue types) to make predictions for the central residue in a triplet.

    The idea behind SPARTA is that if one can find some triplet of residues in a protein of known structure with similar structure and sequence to a triplet in a target protein, then the backbone secondary chemical shifts for this protein will be useful predictors for the backbone secondary chemical shifts in the target.

    The similarity is measured with a score based on the weighted sum of squares differences between the torsion angles in the target protein and the database entries, so that lower scores indicated high similarity. In order to take advantage of the correlations between residue type and secondary structure, the score also includes a small, qualitative residue type term which biases the matching towards roughly similar sequences.

    In practice, SPARTA searches a database for the 20 best matches to a given triplet in the target protein. The weighted averages chemical chemical shifts (obtained by subtracting their corresponding random coil chemical shifts values and the adjustments values arising from the effects of neighboring residues) of the central residues of these 20 matches are used as a prediction for the secondary shift of the central residue. The SPARTA database was constructed using the most well-defined parts of high resolution (2.4 Angstroms or better) X-ray crystal structures to define the phi, psi and chi1 angles, as well as other structural information, such as hydrogen bonding and ring current shifts, which would be used to quantitatively correct the raw predicted shifts from database searching. This database currently includes data from 200 proteins, representing 24,166 triplets.


    Reliability of SPARTA

    The reliability of the SPARTA approach was tested by a cross-validation procedure where each protein was temporarily removed from the database, and its backbone chemical shifts (N, HN, HA, CA, CB and C’) were predicted using the remaining protein data. The RMS deviations between the predicted and experimental shifts are 2.36, 0.46, 0.25, 0.88, 0.97 and 1.01 ppm, respectively. The same shifts prediction accuracies are also obtained for the proteins with known structures which are not contained in the database.

    Importantly, it is also found in the test that the standard deviation the shifts from the central residues of the 20 matches are correlated with the shifts prediction errors. By checking the standard deviations in the prediction summary file (pred/pred.tab) will provide an idea of the prediction reliability.

    It should be noted that the global structural information, such as ring current shifts and hydrogen bonding, was also carefully considered in SPARTA. The secondary shifts in SPARTA database are actually the corrected shifts using the ring current shifts. As “compensation”, the SPARTA predicted shifts for target protein are also corrected by adding the calculated ring current shifts from target protein. For HA and HN, the predicted secondary shifts are also corrected by using the hydrogen bond length and their relationship with the prediction errors, which were derived from above cross-validation. Therefore, the accuracy of the coordinates of the target protein is critical to obtain the reliable hydrogen bond information and ring current shifts, and the final predicted shifts. The calculated hydrogen bond and ring current shifts information is stored in the input summary file (/pred/protein_in.tab).

    It should also be noted that the protein backbone chemical shifts are extremely sensitive to the local conformation; therefore, SPARTA results for the residues in the flexible region or the with very large ring current shifts contribution may be less reliable, which was also indicated by the test.


    Components of the SPARTA Package

    The SPARTA system is implemented using C++. The compiled executable files ($SPARTA_DIR/src/SPARTA for Linux, $SPARTA_DIR/src/SPARTA.exe for Windows) or starting script ("$SPARTA_DIR/sparta" for Linux) can be invoked with "TALOS-like" command-line argument. A complete list of options can be invoked and generated with a "-help"

    command-line argument or simply typing in the executive files or starting script without any command-line arguments.

    Running SPARTA requires definition of the environment variables "SPARTA_DIR"; this will be established automatically by the starting script ("$SPARTA_DIR/sparta" in Linux):

    setenv SPARTA_DIR /disk1/SPARTA
    
    $SPARTA_DIR/src/SPARTA $argv[1-$#argv]

    Note that the default "$SPARTA_DIR" is the current directory if not specified.

    Other files of the SPARTA package include:

    $SPARTA_DIR/tab/sparta.tab
    The compiled database of residue triplets with their corresponding PHI/PSI/CHI1 angles and secondary shifts.

    $SPARTA_DIR/tab/randcoil.tab, rcadj.tab, rcprev.tab, rcnext.tab
    The table of random coil shifts, adjustments values from neighboring residues used in the shifts prediction process. (The same tables as used in TALOS, http://spin.niddk.nih.gov/NMRPipe/talos/)
    $SPARTA_DIR/tab/homology.tab
    The residue type homology factors used in the prediction process, which is similar to the table used by TALOS.
    $SPARTA_DIR.tab/weight.tab
    The weighting factors of PHI, PSI and CHI1 angles, and residue type homology used in the prediction process.
    $SPARTA_DIR.tab/fitting.tab
    The fitting parameters between prediction accuracy and precision, which will be used after the prediction process to calculate the estimated prediction error.

    $SPARTA_DIR/shifts/*.tab
    The files in this directory are only used when compiling a new database. When compiling a new database, only shift tables ending with the ".tab" extension will be used. The files in this directory are the chemical shift tables for the proteins in the database, which are in the same format as the TALOS shifts tables and must be exactly consistent with the corresponding structures in the SPARTA pdb directory.
    $SPARTA_DIR/pdb/*.pdb
    The PDB coordinates files in this directory are only used along with the files in the SPARTA shifts directory when compiling a new database (e.g. adding new proteins into the database). The sequence and residue numbering must be exactly consistent with the corresponding assignments in the SPARTA shifts directory. Furthermore, the names of these files must be exactly consistent with the corresponding chemical shift tables in the SPARTA shifts directory.
    $SPARTA_DIR/test/*
    The contents of this "test" directory include the input files and results for a sample SPARTA analysis.


    How to Use SPARTA

    Use of SPARTA to predict backbone chemical shifts involves the following steps:

    1. Create a directory for the prediction session; all subsequent commands will be executed from this directory.

    2. Prepare the input PDB coordinate file (for example "protein.pdb"), according to the format given above.

    3. Run SPARTA ("$SPARTA_DIR/src/SPARTA" or "$SPARTA_DIR/sparta" in Linux, "$SPARTA_DIR/src/SPARTA.exe" in Windows) to perform the database searches. Most commonly, this will simply require a command such as:

      sparta -in protein.pdb

      SPARTA will first generate an input "pred/protein_in.tab" file from PDB coordinates, which contains of the phi, psi, chi1 angles, H-bonding information and ring current shifts. During the database search, a series of files "pred/X/res*.tab" (X = N, H, HA, CA, CB and C) will be created. Each one of these files tallies the 20 best database matches for a given residue in the target protein. Before exiting, a file "pred.tab" will also be created in "pred" directory, which includes a summary of the prediction results. The database search will typically take about 25 sec for a 100-residue protein on a Linux PC with a 2.8GHz CPU.

    4. If experimental chemical shifts for target protein are available (with a name "ref.tab", for example, and the same format as typical TALOS shift table file, http://spin.niddk.nih.gov/NMRPipe/talos/), SAELDI prediction can be performed by a command such as:

      sparta -in protein.pdb -ref ref.tab

      SPARTA would compare the predicted chemical shifts and experimental shifts before exiting, and a prediction summary file "pred/pred.tab" will be generated to store the comparison between the reference and predicted shifts, as well as the errors. If the average prediction error larger than 3 times of the expected errors (standard deviation of prediction errors / square root of number of shifts), a warning is printed and a reference correction will be applied to the experimental chemical shifts. The corrected reference chemical shifts are stored into a new file "pred/ref.tab"


    Preparing the Input PDB Coordinates

    The input PDB coordinates should be prepared carefully, so that it has the proper format, naming conventions. SPARTA accept the standard PDB coordinates file, but ONLY the FIRST conformer/chain if more than one exist. For PDB coordinates without hydrogen atoms, the hydrogen atoms are required to be added (by using programs DYNAMO, REDUCE, MOLMOL, or any other similar programs) in order to get the hydrogen bonding information and ring current shifts. For HA atoms of Gly, please use atom names of "HA1/HA2"

    Examples of the required PDB coordinate format can be found in the "$SPARTA_DIR/pdb" and

    "$SPARTA_DIR/test" directories.


    Adding New Proteins to the Database

    New protein chemical shift and structure data can be added to the database. Note well that this should be done with great care and caution, to ensure that only reliable phi/psi/chi1 data with consistently referenced and correct chemical shifts are included. It suggests that

    1. The chemical shifts assignments for each candidate protein are better validated by conducting a SPARTA shift prediction using its PDB coordinates.

      sparta -in protein.pdb -ref ref.tab
    2. Check the prediction summary table (pred/pred.tab) files, remove the experimental shifts for which the predicted shifts deviated five standard deviations. Notably, HAs, for which ring current shifts are > 1.5ppm and the predicted shifts deviate the three standard deviations, are better removed.

    3. Chemical shifts shoule be referenced correctly. A quick check can be conduct by runing above SPARTA prediction for this protein and inspecting the average SPARTA prediction errors, which are listed in the header of prediction summary table (pred/pred.tab). By default, SPARTA will apply a shift referencing correction if the average prediction error is larger than 3 times expected error (i.e., standard deviation of prediction errors / square root of number of shifts), and store the corrected shifts in a file "pred/ref.tab"

    Given this, the procedure for adding new proteins to the SPARTA database is simple as:

    1. Create a chemical shift table for the new protein according to the TALOS format (http://spin.niddk.nih.gov/NMRPipe/talos/). Copy the table to the "$SPARTA_DIR/shifts" directory; it must have a ".tab" extension in order to be used.

    2. Place the corresponding PDB structure file into the "$SPARTA_DIR/pdb" directory; it must have a ".pdb" extension, and its file name, sequence, and residue numbering must correspond exactly with the shift table.

    3. Prepare a table file, for example with a name of "list.tab", which only contains the names of proteins to be added into the database. This table must follow the example below:

      VARS   PDB_NAME
      FORMAT %24s
      bpti
      ubiquitin
      profilin
      ...

      Note that the "PDB_NAME" in the table file must consistent with the files names (with ".tab"

      and ".pdb" extension) in the SPARTA pdb and shifts directories.

    4. In the "SPARTA" directory, execute the following command to compile a new database:

      sparta -compile -pdbDir ./pdb -pdbList list.tab
    5. A new database "$SPARTA_DIR/tab/sparta.tab" will be generated from the files in SPARTA pdb and shifts directories. Please backup the old database, which will be overwritten.


    Compile the Source Code

    SPARTA was implemented with standard C++ using Standard Template Library (STL). To compile the source codes (in /src directory), your system must have a compatible C++ compiler and STL library. Given this, the compiling of SPARTA executable file is simple as:

    cd $SPARTA_DIR/src
    make

    The compiling of the SPARTA program has been tested on Windows (XP) and Linux (Linux 9 or newer). The compiled executable files ("$SPARTA_DIR/src/SPARTA" for Linux, or "$SPARTA_DIR/src/SPARTA.exe" for Windows) are contained in the distributed SPARTA package.


    About the Name SPARTA

    o


    In antiquity Sparta was a Dorian Greek military state, originally centered in Laconia. As a city-state devoted to military training, Sparta possessed the most formidable army in the Greek world and regarded itself as the natural protector of Greece.



    [ Home ] [ NIH ] [ NIDDK ] [ Disclaimer ] [ Copyright ]


    last updated:  Apr 2007 / Webmaster

     

    Revision 104 Feb 2008 - Main.DavidCowburn

     
     
    Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
    Ideas, requests, problems regarding this intranet, Send feedback