Difference: ShiftSparta (1 vs. 10)

Revision 1016 Mar 2009 - Main.DavidCowburn

	SPARTA: Shifts Predicted from Analogy in Residue type and Torsion Angle – NYSBC notes As described in the paper: Protein backbone chemical shifts predicted from searching a database for torsion angle and sequence homology Yang Shen and Ad Bax LIBRARY:ShenBax08.pdf Local install – dl380://infotech/spartainstallPC cygwin session ... Script started on Mon Feb 4 12:21:52 2008
Changed:
< <	>>Administrator@cowburn-pc
> >	&
	#[33m/cygdrive/d/spartainstall/SPARTA ./src/sparta in test/ubiquitin.pdb Reading PDB Coordinates from test/ubiquitin.pdb Reading Random Coil Shifts from .\tab\randcoil.tab Reading RC Adjustments from .\tab\rcadj.tab Reading Previous Residue RC Adjustments from .\tab\rcprev.tab Reading Next Residue RC Adjustments from .\tab\rcnext.tab Reading Weighting Factors from .\tab\weight.tab Reading Residue Homology Table from .\tab\homology.tab Reading Fitting Parameter Table from .\tab\fitting.tab Reading .\tab\sparta.tab, 24166 Triplets Can't save file pred\test/ubiquitin_in.tab Analyzing test/ubiquitin.pdb 76 residues read Predicting ... N HA C CA CB H 124.353 5.462 175.920 55.080 30.759 8.947 2 Q test/ubiquitin 116.472 4.213 172.450 59.570 42.210 8.342 3 I test/ubiquitin 119.243 5.693 175.320 55.210 41.480 8.871 4 F test/ubiquitin 122.133 4.870 174.870 60.621 34.230 9.693 5 V test/ubiquitin 128.653 5.367 177.140 54.519 35.050 9.096 6 K test/ubiquitin 116.533 4.970 176.909 60.470 70.630 8.925 7 T test/ubiquitin 122.463 4.310 178.800 57.580 41.970 9.037 8 L test/ubiquitin 106.723 4.428 175.520 61.400 69.140 7.386 9 T test/ubiquitin 110.023 3.978 174.070 45.460 9999.000 7.522 10 G test/ubiquitin 122.734 4.361 175.940 56.280 33.200 6.915 11 K test/ubiquitin 121.573 5.264 174.320 62.390 69.910 8.627 12 T test/ubiquitin 128.243 4.545 175.220 59.980 40.950 9.852 13 I test/ubiquitin 122.653 5.067 173.789 61.940 69.650 8.696 14 T test/ubiquitin 125.933 4.779 174.670 52.830 47.070 8.760 15 L test/ubiquitin 123.293 5.045 175.860 54.820 29.450 8.177 16 E test/ubiquitin 118.342 4.713 174.160 58.431 36.400 9.226 17 V test/ubiquitin 120.123 5.078 176.161 52.720 30.310 8.723 18 E test/ubiquitin 139.146 4.141 175.310 65.470 31.950 9999.000 19 P test/ubiquitin 104.533 4.370 174.660 57.400 63.370 7.137 20 S test/ubiquitin 124.613 4.695 176.360 55.700 40.800 8.351 21 D test/ubiquitin 109.934 5.147 176.750 59.690 71.200 7.948 22 T test/ubiquitin 122.323 3.657 179.040 62.260 34.350 8.688 23 I test/ubiquitin 121.963 3.917 178.640 60.220 28.280 9.795 24 E test/ubiquitin 121.703 4.525 178.379 56.060 38.449 7.723 25 N test/ubiquitin 122.843 3.397 177.950 67.660 30.840 7.978 26 V test/ubiquitin 119.993 4.648 180.550 59.249 33.730 8.617 27 K test/ubiquitin 124.573 4.161 180.300 55.370 17.710 7.904 28 A test/ubiquitin 121.073 4.207 180.320 59.650 33.290 7.933 29 K test/ubiquitin 122.213 3.507 178.310 66.150 36.800 8.326 30 I test/ubiquitin 124.623 3.829 178.890 60.000 27.720 8.622 31 Q test/ubiquitin 120.493 4.354 177.250 57.190 40.580 8.231 32 D test/ubiquitin 116.263 4.337 177.870 58.050 34.170 7.521 33 K test/ubiquitin 115.003 4.625 177.840 55.170 32.661 8.995 34 E test/ubiquitin 109.782 4.035 173.960 46.080 9999.000 8.741 35 G test/ubiquitin 121.013 4.446 173.590 57.750 40.580 6.297 36 I test/ubiquitin 142.438 4.634 176.940 61.660 31.850 9999.000 37 P test/ubiquitin 139.608 4.117 178.320 66.260 32.890 9999.000 38 P test/ubiquitin 114.512 4.430 177.090 55.640 39.540 8.617 39 D test/ubiquitin 117.913 4.583 175.381 55.640 30.140 7.924 40 Q test/ubiquitin 118.853 4.244 176.300 56.470 31.650 7.307 41 Q test/ubiquitin 123.813 4.499 174.050 55.050 31.750 8.520 42 R test/ubiquitin 125.173 5.353 175.290 52.980 45.790 8.867 43 L test/ubiquitin 123.053 5.216 176.060 58.980 41.420 9.487 44 I test/ubiquitin 126.523 5.045 174.470 57.020 43.760 8.869 45 F test/ubiquitin 133.333 3.690 177.289 52.540 16.570 8.897 46 A test/ubiquitin 103.473 3.791 173.810 45.350 9999.000 8.087 47 G test/ubiquitin 122.702 4.623 174.700 54.550 34.530 8.284 48 K test/ubiquitin 123.543 4.666 175.670 55.740 29.000 8.667 49 Q test/ubiquitin 126.653 4.090 176.659 54.240 41.570 8.872 50 L test/ubiquitin 124.073 4.488 175.870 55.960 31.570 8.442 51 E test/ubiquitin 121.163 4.360 177.330 56.959 40.850 8.187 52 D test/ubiquitin 107.793 4.045 174.870 45.170 9999.000 9.567 53 G test/ubiquitin 120.183 4.695 175.350 54.390 32.650 7.288 54 R test/ubiquitin 109.533 5.508 176.560 59.690 72.260 8.799 55 T test/ubiquitin 119.053 4.060 180.810 58.710 40.370 8.176 56 L test/ubiquitin 114.463 4.370 178.310 61.080 62.530 8.585 57 S test/ubiquitin 125.323 4.296 177.400 57.180 40.100 7.654 58 D test/ubiquitin 116.642 4.670 174.700 58.250 40.070 7.124 59 Y test/ubiquitin 117.033 4.355 174.341 54.120 37.410 8.329 60 N test/ubiquitin 119.733 3.393 174.610 62.420 36.740 6.970 61 I test/ubiquitin 125.874 4.506 175.970 53.660 31.650 7.647 62 Q test/ubiquitin 121.433 4.001 175.810 57.791 32.649 8.487 63 K test/ubiquitin 115.083 3.465 175.250 57.890 25.900 9.591 64 E test/ubiquitin 115.863 4.640 172.160 60.890 64.910 7.383 65 S test/ubiquitin 118.242 5.614 173.950 62.340 70.080 8.737 66 T test/ubiquitin 128.243 5.060 175.770 53.900 44.260 9.801 67 L test/ubiquitin 119.513 5.292 173.150 55.000 30.531 9.633 68 H test/ubiquitin 125.592 5.282 175.270 53.890 44.380 8.533 69 L test/ubiquitin 128.073 4.351 173.999 60.800 34.910 9.490 70 V test/ubiquitin 124.262 5.361 177.830 53.940 42.851 8.067 71 L test/ubiquitin 124.244 4.921 174.953 54.777 32.225 9.169 72 R test/ubiquitin 128.176 4.628 176.270 54.090 42.511 8.881 73 L test/ubiquitin 124.343 4.706 175.048 54.919 31.176 8.588 74 R test/ubiquitin 112.599 4.156 173.001 44.721 9999.000 8.348 75 G test/ubiquitin Running time: 20.343 seconds >>/cygdrive/d/spartainstall/SPARTA# Using a standard MS shell with the infotech drive mounted .. src\sparta -in test\ubiquitin.pdb ... Original text -- Contact: shenyang@niddk.nih.gov; bax@nih.gov Web: http://spin.niddk.nih.gov/bax DOWNLOAD [[http://spin.niddk.nih.gov/bax/software/SPARTA/sparta.linux.tar.Z][RedHat Linux /Fedora Core version ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/sparta.win32.zip][Win32 version]] The download unix archive can be unpacked with a command like the following: zcat sparta.linux.tar.Z \| tar xvf - The win32 archive can be unpacked with a traditional Windows zip software. Users are encouraged to email the author to be informed about updates and related software. [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#what%20is%20sparta][ What is SPARTA? ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#reliability][ Reliability of SPARTA ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#components][ Components of the SPARTA Package ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#how%20to%20use][ How to Use SPARTA ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#PDB%20coordinates][ Preparing the PDB Coordinates ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#add%20new%20proteins][ Adding New Proteins to the Database ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#compile][ Compile the Source Code ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#about%20name][ About the Name SPARTA ]] What is SPARTA? SPARTA is a database system for empirical prediction of backbone chemical shifts (N, HN, HA, CA, CB, CO) using a combination of backbone phi, psi torsion angles and sidechain chi1 angles from a given protein with known PDB coordinates. The SPARTA approach is an extension of the well-known observation that many kinds of secondary chemical shifts (i.e. differences between chemical shifts and their corresponding random coil values) are highly correlated with aspects of protein secondary structure. The goal of SPARTA is to use phi, psi, chi1 torsion angles and sequence information from proteins structure in order to make quantitative predictions for the backbone chemical shifts SPARTA uses the phi, psi and chi1 angles of a given residue to predict secondary shifts for that residue. SPARTA also includes the information from the next and previous residues when making predictions for a given residue. So, in practice, SPARTA uses data for three consecutive residues simultaneously (i.e. 9 torsion angles and 3 residue types) to make predictions for the central residue in a triplet. The idea behind SPARTA is that if one can Trash.findDFdf some triplet of residues in a protein of known structure with similar structure and sequence to a triplet in a target protein, then the backbone secondary chemical shifts for this protein will be useful predictors for the backbone secondary chemical shifts in the target. The similarity is measured with a score based on the weighted sum of squares differences between the torsion angles in the target protein and the database entries, so that lower scores indicated high similarity. In order to take advantage of the correlations between residue type and secondary structure, the score also includes a small, qualitative residue type term which biases the matching towards roughly similar sequences. In practice, SPARTA searches a database for the 20 best matches to a given triplet in the target protein. The weighted averages chemical chemical shifts (obtained by subtracting their corresponding random coil chemical shifts values and the adjustments values arising from the effects of neighboring residues) of the central residues of these 20 matches are used as a prediction for the secondary shift of the central residue. The SPARTA database was constructed using the most well-defined parts of high resolution (2.4 Angstroms or better) X-ray crystal structures to define the phi, psi and chi1 angles, as well as other structural information, such as hydrogen bonding and ring current shifts, which would be used to quantitatively correct the raw predicted shifts from database searching. This database currently includes data from 200 proteins, representing 24,166 triplets. Reliability of SPARTA The reliability of the SPARTA approach was tested by a cross-validation procedure where each protein was temporarily removed from the database, and its backbone chemical shifts (N, HN, HA, CA, CB and C’) were predicted using the remaining protein data. The RMS deviations between the predicted and experimental shifts are 2.36, 0.46, 0.25, 0.88, 0.97 and 1.01 ppm, respectively. The same shifts prediction accuracies are also obtained for the proteins with known structures which are not contained in the database. Importantly, it is also found in the test that the standard deviation the shifts from the central residues of the 20 matches are correlated with the shifts prediction errors. By checking the standard deviations in the prediction summary file (pred/pred.tab) will provide an idea of the prediction reliability. It should be noted that the global structural information, such as ring current shifts and hydrogen bonding, was also carefully considered in SPARTA. The secondary shifts in SPARTA database are actually the corrected shifts using the ring current shifts. As “compensation”, the SPARTA predicted shifts for target protein are also corrected by adding the calculated ring current shifts from target protein. For HA and HN, the predicted secondary shifts are also corrected by using the hydrogen bond length and their relationship with the prediction errors, which were derived from above cross-validation. Therefore, the accuracy of the coordinates of the target protein is critical to obtain the reliable hydrogen bond information and ring current shifts, and the final predicted shifts. The calculated hydrogen bond and ring current shifts information is stored in the input summary file (/pred/protein_in.tab). It should also be noted that the protein backbone chemical shifts are extremely sensitive to the local conformation; therefore, SPARTA results for the residues in the flexible region or the with very large ring current shifts contribution may be less reliable, which was also indicated by the test. Components of the SPARTA Package The SPARTA system is implemented using C++. The compiled executable files ($SPARTA_DIR/src/SPARTA for Linux, $SPARTA_DIR/src/SPARTA.exe for Windows) or starting script ("$SPARTA_DIR/sparta" for Linux) can be invoked with "TALOS-like" command-line argument. A complete list of options can be invoked and generated with a "-help" command-line argument or simply typing in the executive files or starting script without any command-line arguments. Running SPARTA requires definition of the environment variables " SPARTA_DIR "; this will be established automatically by the starting script ("$SPARTA_DIR/sparta" in Linux): setenv SPARTA_DIR /disk1/SPARTA $SPARTA_DIR/src/SPARTA $argv[1-$#argv] Note that the default "$SPARTA_DIR" is the current directory if not specified. Other files of the SPARTA package include: $SPARTA_DIR/tab/sparta.tab The compiled database of residue triplets with their corresponding PHI/PSI/CHI1 angles and secondary shifts. $SPARTA_DIR/tab/randcoil.tab, rcadj.tab, rcprev.tab, rcnext.tab The table of random coil shifts, adjustments values from neighboring residues used in the shifts prediction process. (The same tables as used in TALOS, http://spin.niddk.nih.gov/NMRPipe/talos/) $SPARTA_DIR/tab/homology.tab The residue type homology factors used in the prediction process, which is similar to the table used by TALOS. $SPARTA_DIR.tab/weight.tab The weighting factors of PHI, PSI and CHI1 angles, and residue type homology used in the prediction process. $SPARTA_DIR.tab/fitting.tab The fitting parameters between prediction accuracy and precision, which will be used after the prediction process to calculate the estimated prediction error. $SPARTA_DIR/shifts/.tab* The files in this directory are only used when compiling a new database. When compiling a new database, only shift tables ending with the ".tab" extension will be used. The files in this directory are the chemical shift tables for the proteins in the database, which are in the same format as the TALOS shifts tables and must be exactly consistent with the corresponding structures in the SPARTA pdb directory. $SPARTA_DIR/pdb/.pdb* The PDB coordinates files in this directory are only used along with the files in the SPARTA shifts directory when compiling a new database (e.g. adding new proteins into the database). The sequence and residue numbering must be exactly consistent with the corresponding assignments in the SPARTA shifts directory. Furthermore, the names of these files must be exactly consistent with the corresponding chemical shift tables in the SPARTA shifts directory. $SPARTA_DIR/test/* The contents of this "test" directory include the input files and results for a sample SPARTA analysis. How to Use SPARTA Use of SPARTA to predict backbone chemical shifts involves the following steps: Create a directory for the prediction session; all subsequent commands will be executed from this directory. Prepare the input PDB coordinate file (for example "protein.pdb"), according to the format given above. Run SPARTA ("$SPARTA_DIR/src/SPARTA" or "$SPARTA_DIR/sparta" in Linux, "$SPARTA_DIR/src/SPARTA.exe" in Windows) to perform the database searches. Most commonly, this will simply require a command such as: sparta -in protein.pdb SPARTA will first generate an input "pred/protein_in.tab" file from PDB coordinates, which contains of the phi, psi, chi1 angles, H-bonding information and ring current shifts. During the database search, a series of files "pred/X/res.tab" (X = N, H, HA, CA, CB and C) will be created. Each one of these files tallies the 20 best database matches for a given residue in the target protein. Before exiting, a file "pred.tab" will also be created in "pred" directory, which includes a summary of the prediction results. The database search will typically take about 25 sec for a 100-residue protein on a Linux PC with a 2.8GHz CPU. If experimental chemical shifts for target protein are available (with a name "ref.tab", for example, and the same format as typical TALOS shift table file, http://spin.niddk.nih.gov/NMRPipe/talos/), SAELDI prediction can be performed by a command such as: sparta -in protein.pdb -ref ref.tab SPARTA would compare the predicted chemical shifts and experimental shifts before exiting, and a prediction summary file "pred/pred.tab" will be generated to store the comparison between the reference and predicted shifts, as well as the errors. If the average prediction error larger than 3 times of the expected errors (standard deviation of prediction errors / square root of number of shifts), a warning is printed and a reference correction will be applied to the experimental chemical shifts. The corrected reference chemical shifts are stored into a new file "pred/ref.tab" Preparing the Input PDB Coordinates The input PDB coordinates should be prepared carefully, so that it has the proper format, naming conventions. SPARTA accept the standard PDB coordinates file, but ONLY the FIRST* conformer/chain if more than one exist. For PDB coordinates without hydrogen atoms, the hydrogen atoms are required to be added (by using programs DYNAMO, REDUCE, MOLMOL, or any other similar programs) in order to get the hydrogen bonding information and ring current shifts. For HA atoms of Gly, please use atom names of "HA1/HA2" Examples of the required PDB coordinate format can be found in the "$SPARTA_DIR/pdb" and "$SPARTA_DIR/test" directories. Adding New Proteins to the Database New protein chemical shift and structure data can be added to the database. Note well that this should be done with great care and caution, to ensure that only reliable phi/psi/chi1 data with consistently referenced and correct chemical shifts are included. It suggests that The chemical shifts assignments for each candidate protein are better validated by conducting a SPARTA shift prediction using its PDB coordinates. sparta -in protein.pdb -ref ref.tab Check the prediction summary table (pred/pred.tab) files, remove the experimental shifts for which the predicted shifts deviated five standard deviations. Notably, HAs, for which ring current shifts are > 1.5ppm and the predicted shifts deviate the three standard deviations, are better removed. Chemical shifts shoule be referenced correctly. A quick check can be conduct by runing above SPARTA prediction for this protein and inspecting the average SPARTA prediction errors, which are listed in the header of prediction summary table (pred/pred.tab). By default, SPARTA will apply a shift referencing correction if the average prediction error is larger than 3 times expected error (i.e., standard deviation of prediction errors / square root of number of shifts), and store the corrected shifts in a file "pred/ref.tab" Given this, the procedure for adding new proteins to the SPARTA database is simple as: Create a chemical shift table for the new protein according to the TALOS format (http://spin.niddk.nih.gov/NMRPipe/talos/). Copy the table to the "$SPARTA_DIR/shifts" directory; it must have a ".tab" extension in order to be used. Place the corresponding PDB structure file into the "$SPARTA_DIR/pdb" directory; it must have a ".pdb" extension, and its file name, sequence, and residue numbering must correspond exactly with the shift table. Prepare a table file, for example with a name of "list.tab", which only contains the names of proteins to be added into the database. This table must follow the example below: VARS PDB_NAME FORMAT %24s bpti ubiquitin profilin ... Note that the "PDB_NAME" in the table file must consistent with the files names (with ".tab" and ".pdb" extension) in the SPARTA pdb and shifts directories. In the "SPARTA" directory, execute the following command to compile a new database: sparta -compile -pdbDir ./pdb -pdbList list.tab A new database "$SPARTA_DIR/tab/sparta.tab" will be generated from the files in SPARTA pdb and shifts directories. Please backup the old database, which will be overwritten. Compile the Source Code SPARTA was implemented with standard C++ using Standard Template Library (STL). To compile the source codes (in /src directory), your system must have a compatible C++ compiler and STL library. Given this, the compiling of SPARTA executable file is simple as: cd $SPARTA_DIR/src make The compiling of the SPARTA program has been tested on Windows (XP) and Linux (Linux 9 or newer). The compiled executable files ("$SPARTA_DIR/src/SPARTA" for Linux, or "$SPARTA_DIR/src/SPARTA.exe" for Windows) are contained in the distributed SPARTA package. About the Name SPARTA In antiquity Sparta was a Dorian Greek military state, originally centered in Laconia. As a city-state devoted to military training, Sparta possessed the most formidable army in the Greek world and regarded itself as the natural protector of Greece. _[ Home ] [ NIH ] [ NIDDK ] [ Disclaimer ] [ Copyright ]_ _last updated: Apr 2007 / Webmaster_

Revision 917 Sep 2008 - Main.DavidCowburn

SPARTA: Shifts Predicted from Analogy in Residue type and Torsion Angle – NYSBC notes

As described in the paper:

Protein backbone chemical shifts predicted from searching a database for torsion angle and sequence homology

Yang Shen and Ad Bax

LIBRARY:ShenBax08.pdf

Local install – dl380://infotech/spartainstallPC

cygwin session ...

Script started on Mon Feb 4 12:21:52 2008

>>Administrator@cowburn-pc #[33m/cygdrive/d/spartainstall/SPARTA

./src/sparta in test/ubiquitin.pdb

Reading PDB Coordinates from test/ubiquitin.pdb

Reading Random Coil Shifts from .\tab\randcoil.tab

Reading RC Adjustments from .\tab\rcadj.tab

Reading Previous Residue RC Adjustments from .\tab\rcprev.tab

Reading Next Residue RC Adjustments from .\tab\rcnext.tab

Reading Weighting Factors from .\tab\weight.tab

Reading Residue Homology Table from .\tab\homology.tab

Reading Fitting Parameter Table from .\tab\fitting.tab

Reading .\tab\sparta.tab, 24166 Triplets

Can't save file pred\test/ubiquitin_in.tab

Analyzing test/ubiquitin.pdb 76 residues read

Predicting ...

N HA C CA CB H

124.353 5.462 175.920 55.080 30.759 8.947 2 Q test/ubiquitin

116.472 4.213 172.450 59.570 42.210 8.342 3 I test/ubiquitin

119.243 5.693 175.320 55.210 41.480 8.871 4 F test/ubiquitin

122.133 4.870 174.870 60.621 34.230 9.693 5 V test/ubiquitin

128.653 5.367 177.140 54.519 35.050 9.096 6 K test/ubiquitin

116.533 4.970 176.909 60.470 70.630 8.925 7 T test/ubiquitin

122.463 4.310 178.800 57.580 41.970 9.037 8 L test/ubiquitin

106.723 4.428 175.520 61.400 69.140 7.386 9 T test/ubiquitin

110.023 3.978 174.070 45.460 9999.000 7.522 10 G test/ubiquitin

122.734 4.361 175.940 56.280 33.200 6.915 11 K test/ubiquitin

121.573 5.264 174.320 62.390 69.910 8.627 12 T test/ubiquitin

128.243 4.545 175.220 59.980 40.950 9.852 13 I test/ubiquitin

122.653 5.067 173.789 61.940 69.650 8.696 14 T test/ubiquitin

125.933 4.779 174.670 52.830 47.070 8.760 15 L test/ubiquitin

123.293 5.045 175.860 54.820 29.450 8.177 16 E test/ubiquitin

118.342 4.713 174.160 58.431 36.400 9.226 17 V test/ubiquitin

120.123 5.078 176.161 52.720 30.310 8.723 18 E test/ubiquitin

139.146 4.141 175.310 65.470 31.950 9999.000 19 P test/ubiquitin

104.533 4.370 174.660 57.400 63.370 7.137 20 S test/ubiquitin

124.613 4.695 176.360 55.700 40.800 8.351 21 D test/ubiquitin

109.934 5.147 176.750 59.690 71.200 7.948 22 T test/ubiquitin

122.323 3.657 179.040 62.260 34.350 8.688 23 I test/ubiquitin

121.963 3.917 178.640 60.220 28.280 9.795 24 E test/ubiquitin

121.703 4.525 178.379 56.060 38.449 7.723 25 N test/ubiquitin

122.843 3.397 177.950 67.660 30.840 7.978 26 V test/ubiquitin

119.993 4.648 180.550 59.249 33.730 8.617 27 K test/ubiquitin

124.573 4.161 180.300 55.370 17.710 7.904 28 A test/ubiquitin

121.073 4.207 180.320 59.650 33.290 7.933 29 K test/ubiquitin

122.213 3.507 178.310 66.150 36.800 8.326 30 I test/ubiquitin

124.623 3.829 178.890 60.000 27.720 8.622 31 Q test/ubiquitin

120.493 4.354 177.250 57.190 40.580 8.231 32 D test/ubiquitin

116.263 4.337 177.870 58.050 34.170 7.521 33 K test/ubiquitin

115.003 4.625 177.840 55.170 32.661 8.995 34 E test/ubiquitin

109.782 4.035 173.960 46.080 9999.000 8.741 35 G test/ubiquitin

121.013 4.446 173.590 57.750 40.580 6.297 36 I test/ubiquitin

142.438 4.634 176.940 61.660 31.850 9999.000 37 P test/ubiquitin

139.608 4.117 178.320 66.260 32.890 9999.000 38 P test/ubiquitin

114.512 4.430 177.090 55.640 39.540 8.617 39 D test/ubiquitin

117.913 4.583 175.381 55.640 30.140 7.924 40 Q test/ubiquitin

118.853 4.244 176.300 56.470 31.650 7.307 41 Q test/ubiquitin

123.813 4.499 174.050 55.050 31.750 8.520 42 R test/ubiquitin

125.173 5.353 175.290 52.980 45.790 8.867 43 L test/ubiquitin

123.053 5.216 176.060 58.980 41.420 9.487 44 I test/ubiquitin

126.523 5.045 174.470 57.020 43.760 8.869 45 F test/ubiquitin

133.333 3.690 177.289 52.540 16.570 8.897 46 A test/ubiquitin

103.473 3.791 173.810 45.350 9999.000 8.087 47 G test/ubiquitin

122.702 4.623 174.700 54.550 34.530 8.284 48 K test/ubiquitin

123.543 4.666 175.670 55.740 29.000 8.667 49 Q test/ubiquitin

126.653 4.090 176.659 54.240 41.570 8.872 50 L test/ubiquitin

124.073 4.488 175.870 55.960 31.570 8.442 51 E test/ubiquitin

121.163 4.360 177.330 56.959 40.850 8.187 52 D test/ubiquitin

107.793 4.045 174.870 45.170 9999.000 9.567 53 G test/ubiquitin

120.183 4.695 175.350 54.390 32.650 7.288 54 R test/ubiquitin

109.533 5.508 176.560 59.690 72.260 8.799 55 T test/ubiquitin

119.053 4.060 180.810 58.710 40.370 8.176 56 L test/ubiquitin

114.463 4.370 178.310 61.080 62.530 8.585 57 S test/ubiquitin

125.323 4.296 177.400 57.180 40.100 7.654 58 D test/ubiquitin

116.642 4.670 174.700 58.250 40.070 7.124 59 Y test/ubiquitin

117.033 4.355 174.341 54.120 37.410 8.329 60 N test/ubiquitin

119.733 3.393 174.610 62.420 36.740 6.970 61 I test/ubiquitin

125.874 4.506 175.970 53.660 31.650 7.647 62 Q test/ubiquitin

121.433 4.001 175.810 57.791 32.649 8.487 63 K test/ubiquitin

115.083 3.465 175.250 57.890 25.900 9.591 64 E test/ubiquitin

115.863 4.640 172.160 60.890 64.910 7.383 65 S test/ubiquitin

118.242 5.614 173.950 62.340 70.080 8.737 66 T test/ubiquitin

128.243 5.060 175.770 53.900 44.260 9.801 67 L test/ubiquitin

119.513 5.292 173.150 55.000 30.531 9.633 68 H test/ubiquitin

125.592 5.282 175.270 53.890 44.380 8.533 69 L test/ubiquitin

128.073 4.351 173.999 60.800 34.910 9.490 70 V test/ubiquitin

124.262 5.361 177.830 53.940 42.851 8.067 71 L test/ubiquitin

124.244 4.921 174.953 54.777 32.225 9.169 72 R test/ubiquitin

128.176 4.628 176.270 54.090 42.511 8.881 73 L test/ubiquitin

124.343 4.706 175.048 54.919 31.176 8.588 74 R test/ubiquitin

112.599 4.156 173.001 44.721 9999.000 8.348 75 G test/ubiquitin

Running time: 20.343 seconds

>>/cygdrive/d/spartainstall/SPARTA#

Using a standard MS shell with the infotech drive mounted ..

src\sparta -in test\ubiquitin.pdb ...

Original text --

Contact: shenyang@niddk.nih.gov; bax@nih.gov Web: http://spin.niddk.nih.gov/bax

DOWNLOAD

[[http://spin.niddk.nih.gov/bax/software/SPARTA/sparta.linux.tar.Z][RedHat Linux /Fedora Core version ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/sparta.win32.zip][Win32 version]]

The download unix archive can be unpacked with a command like the following:

   zcat sparta.linux.tar.Z | tar xvf -

The win32 archive can be unpacked with a traditional Windows zip software.

Users are encouraged to email the author to be informed about updates and related software.

[[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#what%20is%20sparta][ *What is SPARTA?* ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#reliability][ *Reliability of SPARTA* ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#components][ *Components of the SPARTA Package* ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#how%20to%20use][ *How to Use SPARTA* ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#PDB%20coordinates][ *Preparing the PDB Coordinates* ]]

[[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#add%20new%20proteins][ *Adding New Proteins to the Database* ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#compile][ *Compile the Source Code* ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#about%20name][ *About the Name SPARTA* ]]

What

is SPARTA?

SPARTA is a database system for empirical prediction of backbone chemical shifts (N, HN, HA, CA, CB, CO) using a combination of backbone phi, psi torsion angles and sidechain chi1 angles from a given protein with known PDB coordinates. The SPARTA approach is an extension of the well-known observation that many kinds of secondary chemical shifts (i.e. differences between chemical shifts and their corresponding random coil values) are highly correlated with aspects of protein secondary structure. The goal of SPARTA is to use phi, psi, chi1 torsion angles and sequence information from proteins structure in order to make quantitative predictions for the backbone chemical shifts

SPARTA uses the phi, psi and chi1 angles of a given residue to predict secondary shifts for that residue. SPARTA also includes the information from the next and previous residues when making predictions for a given residue. So, in practice, SPARTA uses data for three consecutive residues simultaneously (i.e. 9 torsion angles and 3 residue types) to make predictions for the central residue in a triplet.

The idea behind SPARTA is that if one can Trash.findDFdf some triplet of residues in a protein of known structure with similar structure and sequence to a triplet in a target protein, then the backbone secondary chemical shifts for this protein will be useful predictors for the backbone secondary chemical shifts in the target.

The similarity is measured with a score based on the weighted sum of squares differences between the torsion angles in the target protein and the database entries, so that lower scores indicated high similarity. In order to take advantage of the correlations between residue type and secondary structure, the score also includes a small, qualitative residue type term which biases the matching towards roughly similar sequences.

In practice, SPARTA searches a database for the 20 best matches to a given triplet in the target protein. The weighted averages chemical chemical shifts (obtained by subtracting their corresponding random coil chemical shifts values and the adjustments values arising from the effects of neighboring residues) of the central residues of these 20 matches are used as a prediction for the secondary shift of the central residue. The SPARTA database was constructed using the most well-defined parts of high resolution (2.4 Angstroms or better) X-ray crystal structures to define the phi, psi and chi1 angles, as well as other structural information, such as hydrogen bonding and ring current shifts, which would be used to quantitatively correct the raw predicted shifts from database searching. This database currently includes data from 200 proteins, representing 24,166 triplets.

Reliability

of SPARTA

The reliability of the SPARTA approach was tested by a cross-validation procedure where each protein was temporarily removed from the database, and its backbone chemical shifts (N, HN, HA, CA, CB and C’) were predicted using the remaining protein data. The RMS deviations between the predicted and experimental shifts are 2.36, 0.46, 0.25, 0.88, 0.97 and 1.01 ppm, respectively. The same shifts prediction accuracies are also obtained for the proteins with known structures which are not contained in the database.

Importantly, it is also found in the test that the standard deviation the shifts from the central residues of the 20 matches are correlated with the shifts prediction errors. By checking the standard deviations in the prediction summary file (pred/pred.tab) will provide an idea of the prediction reliability.

It should be noted that the global structural information, such as ring current shifts and hydrogen bonding, was also carefully considered in SPARTA. The secondary shifts in SPARTA database are actually the corrected shifts using the ring current shifts. As “compensation”, the SPARTA predicted shifts for target protein are also corrected by adding the calculated ring current shifts from target protein. For HA and HN, the predicted secondary shifts are also corrected by using the hydrogen bond length and their relationship with the prediction errors, which were derived from above cross-validation. Therefore, the accuracy of the coordinates of the target protein is critical to obtain the reliable hydrogen bond information and ring current shifts, and the final predicted shifts. The calculated hydrogen bond and ring current shifts information is stored in the input summary file (/pred/protein_in.tab).

It should also be noted that the protein backbone chemical shifts are extremely sensitive to the local conformation; therefore, SPARTA results for the residues in the flexible region or the with very large ring current shifts contribution may be less reliable, which was also indicated by the test.

Components

of the SPARTA Package

The SPARTA system is implemented using C++. The compiled executable files ($SPARTA_DIR/src/SPARTA for Linux, $SPARTA_DIR/src/SPARTA.exe for Windows) or starting script ("$SPARTA_DIR/sparta" for Linux) can be invoked with "TALOS-like" command-line argument. A complete list of options can be invoked and generated with a "-help"

command-line argument or simply typing in the executive files or starting script without any command-line arguments.

Running SPARTA requires definition of the environment variables " SPARTA_DIR "; this will be established automatically by the starting script ("$SPARTA_DIR/sparta" in Linux):

setenv SPARTA_DIR /disk1/SPARTA

$SPARTA_DIR/src/SPARTA $argv[1-$#argv]

Note that the default "$SPARTA_DIR" is the current directory if not specified.

Other files of the SPARTA package include:

$SPARTA_DIR/tab/sparta.tab The compiled database of residue triplets with their corresponding PHI/PSI/CHI1 angles and secondary shifts.

*$SPARTA_DIR/tab/randcoil.tab, rcadj.tab, rcprev.tab, rcnext.tab* The table of random coil shifts, adjustments values from neighboring residues used in the shifts prediction process. (The same tables as used in TALOS, http://spin.niddk.nih.gov/NMRPipe/talos/)

$SPARTA_DIR/tab/homology.tab The residue type homology factors used in the prediction process, which is similar to the table used by TALOS.

$SPARTA_DIR.tab/weight.tab The weighting factors of PHI, PSI and CHI1 angles, and residue type homology used in the prediction process.

$SPARTA_DIR.tab/fitting.tab The fitting parameters between prediction accuracy and precision, which will be used after the prediction process to calculate the estimated prediction error.

$SPARTA_DIR/shifts/.tab* The files in this directory are only used when compiling a new database. When compiling a new database, only shift tables ending with the ".tab" extension will be used. The files in this directory are the chemical shift tables for the proteins in the database, which are in the same format as the TALOS shifts tables and must be exactly consistent with the corresponding structures in the SPARTA pdb directory.

$SPARTA_DIR/pdb/.pdb* The PDB coordinates files in this directory are only used along with the files in the SPARTA shifts directory when compiling a new database (e.g. adding new proteins into the database). The sequence and residue numbering must be exactly consistent with the corresponding assignments in the SPARTA shifts directory. Furthermore, the names of these files must be exactly consistent with the corresponding chemical shift tables in the SPARTA shifts directory.

$SPARTA_DIR/test/* The contents of this "test" directory include the input files and results for a sample SPARTA analysis.

How to Use

SPARTA

Use of SPARTA to predict backbone chemical shifts involves the following steps:

Create a directory for the prediction session; all subsequent commands will be executed from this directory.
Prepare the input PDB coordinate file (for example "protein.pdb"), according to the format given above.
Run SPARTA ("$SPARTA_DIR/src/SPARTA" or "$SPARTA_DIR/sparta" in Linux, "$SPARTA_DIR/src/SPARTA.exe" in Windows) to perform the database searches. Most commonly, this will simply require a command such as:
```
sparta -in protein.pdb
```
SPARTA will first generate an input "pred/protein_in.tab" file from PDB coordinates, which contains of the phi, psi, chi1 angles, H-bonding information and ring current shifts. During the database search, a series of files "pred/X/res*.tab" (X = N, H, HA, CA, CB and C) will be created. Each one of these files tallies the 20 best database matches for a given residue in the target protein. Before exiting, a file "pred.tab" will also be created in "pred" directory, which includes a summary of the prediction results. The database search will typically take about 25 sec for a 100-residue protein on a Linux PC with a 2.8GHz CPU.
If experimental chemical shifts for target protein are available (with a name "ref.tab", for example, and the same format as typical TALOS shift table file, http://spin.niddk.nih.gov/NMRPipe/talos/), SAELDI prediction can be performed by a command such as:
```
sparta -in protein.pdb -ref ref.tab
```
SPARTA would compare the predicted chemical shifts and experimental shifts before exiting, and a prediction summary file "pred/pred.tab" will be generated to store the comparison between the reference and predicted shifts, as well as the errors. If the average prediction error larger than 3 times of the expected errors (standard deviation of prediction errors / square root of number of shifts), a warning is printed and a reference correction will be applied to the experimental chemical shifts. The corrected reference chemical shifts are stored into a new file "pred/ref.tab"

Preparing

the Input PDB Coordinates

The input PDB coordinates should be prepared carefully, so that it has the proper format, naming conventions. SPARTA accept the standard PDB coordinates file, but ONLY the FIRST conformer/chain if more than one exist. For PDB coordinates without hydrogen atoms, the hydrogen atoms are required to be added (by using programs DYNAMO, REDUCE, MOLMOL, or any other similar programs) in order to get the hydrogen bonding information and ring current shifts. For HA atoms of Gly, please use atom names of "HA1/HA2"

Examples of the required PDB coordinate format can be found in the "$SPARTA_DIR/pdb" and

"$SPARTA_DIR/test" directories.

Adding

New Proteins to the Database

New protein chemical shift and structure data can be added to the database. Note well that this should be done with great care and caution, to ensure that only reliable phi/psi/chi1 data with consistently referenced and correct chemical shifts are included. It suggests that

The chemical shifts assignments for each candidate protein are better validated by conducting a SPARTA shift prediction using its PDB coordinates.
```
sparta -in protein.pdb -ref ref.tab
```
Check the prediction summary table (pred/pred.tab) files, remove the experimental shifts for which the predicted shifts deviated five standard deviations. Notably, HAs, for which ring current shifts are > 1.5ppm and the predicted shifts deviate the three standard deviations, are better removed.
Chemical shifts shoule be referenced correctly. A quick check can be conduct by runing above SPARTA prediction for this protein and inspecting the average SPARTA prediction errors, which are listed in the header of prediction summary table (pred/pred.tab). By default, SPARTA will apply a shift referencing correction if the average prediction error is larger than 3 times expected error (i.e., standard deviation of prediction errors / square root of number of shifts), and store the corrected shifts in a file "pred/ref.tab"

Given this, the procedure for adding new proteins to the SPARTA database is simple as:

Create a chemical shift table for the new protein according to the TALOS format (http://spin.niddk.nih.gov/NMRPipe/talos/). Copy the table to the "$SPARTA_DIR/shifts" directory; it must have a ".tab" extension in order to be used.
Place the corresponding PDB structure file into the "$SPARTA_DIR/pdb" directory; it must have a ".pdb" extension, and its file name, sequence, and residue numbering must correspond exactly with the shift table.
Prepare a table file, for example with a name of "list.tab", which only contains the names of proteins to be added into the database. This table must follow the example below:
```
VARS   PDB_NAME FORMAT %24s bpti ubiquitin profilin ...
```
Note that the "PDB_NAME" in the table file must consistent with the files names (with ".tab"
and ".pdb" extension) in the SPARTA pdb and shifts directories.
In the "SPARTA" directory, execute the following command to compile a new database:
```
sparta -compile -pdbDir ./pdb -pdbList list.tab
```
A new database "$SPARTA_DIR/tab/sparta.tab" will be generated from the files in SPARTA pdb and shifts directories. Please backup the old database, which will be overwritten.

Compile the

Source Code

SPARTA was implemented with standard C++ using Standard Template Library (STL). To compile the source codes (in /src directory), your system must have a compatible C++ compiler and STL library. Given this, the compiling of SPARTA executable file is simple as:

cd $SPARTA_DIR/src
make

The compiling of the SPARTA program has been tested on Windows (XP) and Linux (Linux 9 or newer). The compiled executable files ("$SPARTA_DIR/src/SPARTA" for Linux, or "$SPARTA_DIR/src/SPARTA.exe" for Windows) are contained in the distributed SPARTA package.

About the

Name SPARTA

In antiquity Sparta was a Dorian Greek military state, originally centered in Laconia. As a city-state devoted to military training, Sparta possessed the most formidable army in the Greek world and regarded itself as the natural protector of Greece.

_[ Home ] [ NIH ] [ NIDDK ] [ Disclaimer ] [ Copyright ]_

_last updated: Apr 2007 / Webmaster_

Revision 816 Jul 2008 - Main.DavidCowburn

	SPARTA: Shifts Predicted from Analogy in Residue type and Torsion Angle – NYSBC notes As described in the paper: Protein backbone chemical shifts predicted from searching a database for torsion angle and sequence homology Yang Shen and Ad Bax LIBRARY:ShenBax08.pdf Local install – dl380://infotech/spartainstallPC cygwin session ... Script started on Mon Feb 4 12:21:52 2008 >>Administrator@cowburn-pc #[33m/cygdrive/d/spartainstall/SPARTA ./src/sparta in test/ubiquitin.pdb Reading PDB Coordinates from test/ubiquitin.pdb Reading Random Coil Shifts from .\tab\randcoil.tab Reading RC Adjustments from .\tab\rcadj.tab Reading Previous Residue RC Adjustments from .\tab\rcprev.tab Reading Next Residue RC Adjustments from .\tab\rcnext.tab Reading Weighting Factors from .\tab\weight.tab Reading Residue Homology Table from .\tab\homology.tab Reading Fitting Parameter Table from .\tab\fitting.tab Reading .\tab\sparta.tab, 24166 Triplets Can't save file pred\test/ubiquitin_in.tab Analyzing test/ubiquitin.pdb 76 residues read Predicting ... N HA C CA CB H 124.353 5.462 175.920 55.080 30.759 8.947 2 Q test/ubiquitin 116.472 4.213 172.450 59.570 42.210 8.342 3 I test/ubiquitin 119.243 5.693 175.320 55.210 41.480 8.871 4 F test/ubiquitin 122.133 4.870 174.870 60.621 34.230 9.693 5 V test/ubiquitin 128.653 5.367 177.140 54.519 35.050 9.096 6 K test/ubiquitin 116.533 4.970 176.909 60.470 70.630 8.925 7 T test/ubiquitin 122.463 4.310 178.800 57.580 41.970 9.037 8 L test/ubiquitin 106.723 4.428 175.520 61.400 69.140 7.386 9 T test/ubiquitin 110.023 3.978 174.070 45.460 9999.000 7.522 10 G test/ubiquitin 122.734 4.361 175.940 56.280 33.200 6.915 11 K test/ubiquitin 121.573 5.264 174.320 62.390 69.910 8.627 12 T test/ubiquitin 128.243 4.545 175.220 59.980 40.950 9.852 13 I test/ubiquitin 122.653 5.067 173.789 61.940 69.650 8.696 14 T test/ubiquitin 125.933 4.779 174.670 52.830 47.070 8.760 15 L test/ubiquitin 123.293 5.045 175.860 54.820 29.450 8.177 16 E test/ubiquitin 118.342 4.713 174.160 58.431 36.400 9.226 17 V test/ubiquitin 120.123 5.078 176.161 52.720 30.310 8.723 18 E test/ubiquitin 139.146 4.141 175.310 65.470 31.950 9999.000 19 P test/ubiquitin 104.533 4.370 174.660 57.400 63.370 7.137 20 S test/ubiquitin 124.613 4.695 176.360 55.700 40.800 8.351 21 D test/ubiquitin 109.934 5.147 176.750 59.690 71.200 7.948 22 T test/ubiquitin 122.323 3.657 179.040 62.260 34.350 8.688 23 I test/ubiquitin 121.963 3.917 178.640 60.220 28.280 9.795 24 E test/ubiquitin 121.703 4.525 178.379 56.060 38.449 7.723 25 N test/ubiquitin 122.843 3.397 177.950 67.660 30.840 7.978 26 V test/ubiquitin 119.993 4.648 180.550 59.249 33.730 8.617 27 K test/ubiquitin 124.573 4.161 180.300 55.370 17.710 7.904 28 A test/ubiquitin 121.073 4.207 180.320 59.650 33.290 7.933 29 K test/ubiquitin 122.213 3.507 178.310 66.150 36.800 8.326 30 I test/ubiquitin 124.623 3.829 178.890 60.000 27.720 8.622 31 Q test/ubiquitin 120.493 4.354 177.250 57.190 40.580 8.231 32 D test/ubiquitin 116.263 4.337 177.870 58.050 34.170 7.521 33 K test/ubiquitin 115.003 4.625 177.840 55.170 32.661 8.995 34 E test/ubiquitin 109.782 4.035 173.960 46.080 9999.000 8.741 35 G test/ubiquitin 121.013 4.446 173.590 57.750 40.580 6.297 36 I test/ubiquitin 142.438 4.634 176.940 61.660 31.850 9999.000 37 P test/ubiquitin 139.608 4.117 178.320 66.260 32.890 9999.000 38 P test/ubiquitin 114.512 4.430 177.090 55.640 39.540 8.617 39 D test/ubiquitin 117.913 4.583 175.381 55.640 30.140 7.924 40 Q test/ubiquitin 118.853 4.244 176.300 56.470 31.650 7.307 41 Q test/ubiquitin 123.813 4.499 174.050 55.050 31.750 8.520 42 R test/ubiquitin 125.173 5.353 175.290 52.980 45.790 8.867 43 L test/ubiquitin 123.053 5.216 176.060 58.980 41.420 9.487 44 I test/ubiquitin 126.523 5.045 174.470 57.020 43.760 8.869 45 F test/ubiquitin 133.333 3.690 177.289 52.540 16.570 8.897 46 A test/ubiquitin 103.473 3.791 173.810 45.350 9999.000 8.087 47 G test/ubiquitin 122.702 4.623 174.700 54.550 34.530 8.284 48 K test/ubiquitin 123.543 4.666 175.670 55.740 29.000 8.667 49 Q test/ubiquitin 126.653 4.090 176.659 54.240 41.570 8.872 50 L test/ubiquitin 124.073 4.488 175.870 55.960 31.570 8.442 51 E test/ubiquitin 121.163 4.360 177.330 56.959 40.850 8.187 52 D test/ubiquitin 107.793 4.045 174.870 45.170 9999.000 9.567 53 G test/ubiquitin 120.183 4.695 175.350 54.390 32.650 7.288 54 R test/ubiquitin 109.533 5.508 176.560 59.690 72.260 8.799 55 T test/ubiquitin 119.053 4.060 180.810 58.710 40.370 8.176 56 L test/ubiquitin 114.463 4.370 178.310 61.080 62.530 8.585 57 S test/ubiquitin 125.323 4.296 177.400 57.180 40.100 7.654 58 D test/ubiquitin 116.642 4.670 174.700 58.250 40.070 7.124 59 Y test/ubiquitin 117.033 4.355 174.341 54.120 37.410 8.329 60 N test/ubiquitin 119.733 3.393 174.610 62.420 36.740 6.970 61 I test/ubiquitin 125.874 4.506 175.970 53.660 31.650 7.647 62 Q test/ubiquitin 121.433 4.001 175.810 57.791 32.649 8.487 63 K test/ubiquitin 115.083 3.465 175.250 57.890 25.900 9.591 64 E test/ubiquitin 115.863 4.640 172.160 60.890 64.910 7.383 65 S test/ubiquitin 118.242 5.614 173.950 62.340 70.080 8.737 66 T test/ubiquitin 128.243 5.060 175.770 53.900 44.260 9.801 67 L test/ubiquitin 119.513 5.292 173.150 55.000 30.531 9.633 68 H test/ubiquitin 125.592 5.282 175.270 53.890 44.380 8.533 69 L test/ubiquitin 128.073 4.351 173.999 60.800 34.910 9.490 70 V test/ubiquitin 124.262 5.361 177.830 53.940 42.851 8.067 71 L test/ubiquitin 124.244 4.921 174.953 54.777 32.225 9.169 72 R test/ubiquitin 128.176 4.628 176.270 54.090 42.511 8.881 73 L test/ubiquitin 124.343 4.706 175.048 54.919 31.176 8.588 74 R test/ubiquitin 112.599 4.156 173.001 44.721 9999.000 8.348 75 G test/ubiquitin Running time: 20.343 seconds >>/cygdrive/d/spartainstall/SPARTA# Using a standard MS shell with the infotech drive mounted .. src\sparta -in test\ubiquitin.pdb ... Original text -- Contact: shenyang@niddk.nih.gov; bax@nih.gov Web: http://spin.niddk.nih.gov/bax DOWNLOAD [[http://spin.niddk.nih.gov/bax/software/SPARTA/sparta.linux.tar.Z][RedHat Linux /Fedora Core version ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/sparta.win32.zip][Win32 version]] The download unix archive can be unpacked with a command like the following: zcat sparta.linux.tar.Z \| tar xvf - The win32 archive can be unpacked with a traditional Windows zip software. Users are encouraged to email the author to be informed about updates and related software. [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#what%20is%20sparta][ What is SPARTA? ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#reliability][ Reliability of SPARTA ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#components][ Components of the SPARTA Package ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#how%20to%20use][ How to Use SPARTA ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#PDB%20coordinates][ Preparing the PDB Coordinates ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#add%20new%20proteins][ Adding New Proteins to the Database ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#compile][ Compile the Source Code ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#about%20name][ About the Name SPARTA ]] What is SPARTA? SPARTA is a database system for empirical prediction of backbone chemical shifts (N, HN, HA, CA, CB, CO) using a combination of backbone phi, psi torsion angles and sidechain chi1 angles from a given protein with known PDB coordinates. The SPARTA approach is an extension of the well-known observation that many kinds of secondary chemical shifts (i.e. differences between chemical shifts and their corresponding random coil values) are highly correlated with aspects of protein secondary structure. The goal of SPARTA is to use phi, psi, chi1 torsion angles and sequence information from proteins structure in order to make quantitative predictions for the backbone chemical shifts SPARTA uses the phi, psi and chi1 angles of a given residue to predict secondary shifts for that residue. SPARTA also includes the information from the next and previous residues when making predictions for a given residue. So, in practice, SPARTA uses data for three consecutive residues simultaneously (i.e. 9 torsion angles and 3 residue types) to make predictions for the central residue in a triplet. The idea behind SPARTA is that if
Changed:
< <	one can find some triplet of residues in a protein of known structure
> >	one can Trash.findDFdf some triplet of residues in a protein of known structure
	with similar structure and sequence to a triplet in a target protein, then the backbone secondary chemical shifts for this protein will be useful predictors for the backbone secondary chemical shifts in the target. The similarity is measured with a score based on the weighted sum of squares differences between the torsion angles in the target protein and the database entries, so that lower scores indicated high similarity. In order to take advantage of the correlations between residue type and secondary structure, the score also includes a small, qualitative residue type term which biases the matching towards roughly similar sequences. In practice, SPARTA searches a database for the 20 best matches to a given triplet in the target protein. The weighted averages chemical chemical shifts (obtained by subtracting their corresponding random coil chemical shifts values and the adjustments values arising from the effects of neighboring residues) of the central residues of these 20 matches are used as a prediction for the secondary shift of the central residue. The SPARTA database was constructed using the most well-defined parts of high resolution (2.4 Angstroms or better) X-ray crystal structures to define the phi, psi and chi1 angles, as well as other structural information, such as hydrogen bonding and ring current shifts, which would be used to quantitatively correct the raw predicted shifts from database searching. This database currently includes data from 200 proteins, representing 24,166 triplets. Reliability of SPARTA The reliability of the SPARTA approach was tested by a cross-validation procedure where each protein was temporarily removed from the database, and its backbone chemical shifts (N, HN, HA, CA, CB and C’) were predicted using the remaining protein data. The RMS deviations between the predicted and experimental shifts are 2.36, 0.46, 0.25, 0.88, 0.97 and 1.01 ppm, respectively. The same shifts prediction accuracies are also obtained for the proteins with known structures which are not contained in the database. Importantly, it is also found in the test that the standard deviation the shifts from the central residues of the 20 matches are correlated with the shifts prediction errors. By checking the standard deviations in the prediction summary file (pred/pred.tab) will provide an idea of the prediction reliability. It should be noted that the global structural information, such as ring current shifts and hydrogen bonding, was also carefully considered in SPARTA. The secondary shifts in SPARTA database are actually the corrected shifts using the ring current shifts. As “compensation”, the SPARTA predicted shifts for target protein are also corrected by adding the calculated ring current shifts from target protein. For HA and HN, the predicted secondary shifts are also corrected by using the hydrogen bond length and their relationship with the prediction errors, which were derived from above cross-validation. Therefore, the accuracy of the coordinates of the target protein is critical to obtain the reliable hydrogen bond information and ring current shifts, and the final predicted shifts. The calculated hydrogen bond and ring current shifts information is stored in the input summary file (/pred/protein_in.tab). It should also be noted that the protein backbone chemical shifts are extremely sensitive to the local conformation; therefore, SPARTA results for the residues in the flexible region or the with very large ring current shifts contribution may be less reliable, which was also indicated by the test. Components of the SPARTA Package The SPARTA system is implemented using C++. The compiled executable files ($SPARTA_DIR/src/SPARTA for Linux, $SPARTA_DIR/src/SPARTA.exe for Windows) or starting script ("$SPARTA_DIR/sparta" for Linux) can be invoked with "TALOS-like" command-line argument. A complete list of options can be invoked and generated with a "-help" command-line argument or simply typing in the executive files or starting script without any command-line arguments. Running SPARTA requires definition of the environment variables " SPARTA_DIR "; this will be established automatically by the starting script ("$SPARTA_DIR/sparta" in Linux): setenv SPARTA_DIR /disk1/SPARTA $SPARTA_DIR/src/SPARTA $argv[1-$#argv] Note that the default "$SPARTA_DIR" is the current directory if not specified. Other files of the SPARTA package include: $SPARTA_DIR/tab/sparta.tab The compiled database of residue triplets with their corresponding PHI/PSI/CHI1 angles and secondary shifts. $SPARTA_DIR/tab/randcoil.tab, rcadj.tab, rcprev.tab, rcnext.tab The table of random coil shifts, adjustments values from neighboring residues used in the shifts prediction process. (The same tables as used in TALOS, http://spin.niddk.nih.gov/NMRPipe/talos/) $SPARTA_DIR/tab/homology.tab The residue type homology factors used in the prediction process, which is similar to the table used by TALOS. $SPARTA_DIR.tab/weight.tab The weighting factors of PHI, PSI and CHI1 angles, and residue type homology used in the prediction process. $SPARTA_DIR.tab/fitting.tab The fitting parameters between prediction accuracy and precision, which will be used after the prediction process to calculate the estimated prediction error. $SPARTA_DIR/shifts/.tab* The files in this directory are only used when compiling a new database. When compiling a new database, only shift tables ending with the ".tab" extension will be used. The files in this directory are the chemical shift tables for the proteins in the database, which are in the same format as the TALOS shifts tables and must be exactly consistent with the corresponding structures in the SPARTA pdb directory. $SPARTA_DIR/pdb/.pdb* The PDB coordinates files in this directory are only used along with the files in the SPARTA shifts directory when compiling a new database (e.g. adding new proteins into the database). The sequence and residue numbering must be exactly consistent with the corresponding assignments in the SPARTA shifts directory. Furthermore, the names of these files must be exactly consistent with the corresponding chemical shift tables in the SPARTA shifts directory. $SPARTA_DIR/test/* The contents of this "test" directory include the input files and results for a sample SPARTA analysis. How to Use SPARTA Use of SPARTA to predict backbone chemical shifts involves the following steps: Create a directory for the prediction session; all subsequent commands will be executed from this directory. Prepare the input PDB coordinate file (for example "protein.pdb"), according to the format given above. Run SPARTA ("$SPARTA_DIR/src/SPARTA" or "$SPARTA_DIR/sparta" in Linux, "$SPARTA_DIR/src/SPARTA.exe" in Windows) to perform the database searches. Most commonly, this will simply require a command such as: sparta -in protein.pdb SPARTA will first generate an input "pred/protein_in.tab" file from PDB coordinates, which contains of the phi, psi, chi1 angles, H-bonding information and ring current shifts. During the database search, a series of files "pred/X/res.tab" (X = N, H, HA, CA, CB and C) will be created. Each one of these files tallies the 20 best database matches for a given residue in the target protein. Before exiting, a file "pred.tab" will also be created in "pred" directory, which includes a summary of the prediction results. The database search will typically take about 25 sec for a 100-residue protein on a Linux PC with a 2.8GHz CPU. If experimental chemical shifts for target protein are available (with a name "ref.tab", for example, and the same format as typical TALOS shift table file, http://spin.niddk.nih.gov/NMRPipe/talos/), SAELDI prediction can be performed by a command such as: sparta -in protein.pdb -ref ref.tab SPARTA would compare the predicted chemical shifts and experimental shifts before exiting, and a prediction summary file "pred/pred.tab" will be generated to store the comparison between the reference and predicted shifts, as well as the errors. If the average prediction error larger than 3 times of the expected errors (standard deviation of prediction errors / square root of number of shifts), a warning is printed and a reference correction will be applied to the experimental chemical shifts. The corrected reference chemical shifts are stored into a new file "pred/ref.tab" Preparing the Input PDB Coordinates The input PDB coordinates should be prepared carefully, so that it has the proper format, naming conventions. SPARTA accept the standard PDB coordinates file, but ONLY the FIRST* conformer/chain if more than one exist. For PDB coordinates without hydrogen atoms, the hydrogen atoms are required to be added (by using programs DYNAMO, REDUCE, MOLMOL, or any other similar programs) in order to get the hydrogen bonding information and ring current shifts. For HA atoms of Gly, please use atom names of "HA1/HA2" Examples of the required PDB coordinate format can be found in the "$SPARTA_DIR/pdb" and "$SPARTA_DIR/test" directories. Adding New Proteins to the Database New protein chemical shift and structure data can be added to the database. Note well that this should be done with great care and caution, to ensure that only reliable phi/psi/chi1 data with consistently referenced and correct chemical shifts are included. It suggests that The chemical shifts assignments for each candidate protein are better validated by conducting a SPARTA shift prediction using its PDB coordinates. sparta -in protein.pdb -ref ref.tab Check the prediction summary table (pred/pred.tab) files, remove the experimental shifts for which the predicted shifts deviated five standard deviations. Notably, HAs, for which ring current shifts are > 1.5ppm and the predicted shifts deviate the three standard deviations, are better removed. Chemical shifts shoule be referenced correctly. A quick check can be conduct by runing above SPARTA prediction for this protein and inspecting the average SPARTA prediction errors, which are listed in the header of prediction summary table (pred/pred.tab). By default, SPARTA will apply a shift referencing correction if the average prediction error is larger than 3 times expected error (i.e., standard deviation of prediction errors / square root of number of shifts), and store the corrected shifts in a file "pred/ref.tab" Given this, the procedure for adding new proteins to the SPARTA database is simple as: Create a chemical shift table for the new protein according to the TALOS format (http://spin.niddk.nih.gov/NMRPipe/talos/). Copy the table to the "$SPARTA_DIR/shifts" directory; it must have a ".tab" extension in order to be used. Place the corresponding PDB structure file into the "$SPARTA_DIR/pdb" directory; it must have a ".pdb" extension, and its file name, sequence, and residue numbering must correspond exactly with the shift table. Prepare a table file, for example with a name of "list.tab", which only contains the names of proteins to be added into the database. This table must follow the example below: VARS PDB_NAME FORMAT %24s bpti ubiquitin profilin ... Note that the "PDB_NAME" in the table file must consistent with the files names (with ".tab" and ".pdb" extension) in the SPARTA pdb and shifts directories. In the "SPARTA" directory, execute the following command to compile a new database: sparta -compile -pdbDir ./pdb -pdbList list.tab A new database "$SPARTA_DIR/tab/sparta.tab" will be generated from the files in SPARTA pdb and shifts directories. Please backup the old database, which will be overwritten. Compile the Source Code SPARTA was implemented with standard C++ using Standard Template Library (STL). To compile the source codes (in /src directory), your system must have a compatible C++ compiler and STL library. Given this, the compiling of SPARTA executable file is simple as: cd $SPARTA_DIR/src make The compiling of the SPARTA program has been tested on Windows (XP) and Linux (Linux 9 or newer). The compiled executable files ("$SPARTA_DIR/src/SPARTA" for Linux, or "$SPARTA_DIR/src/SPARTA.exe" for Windows) are contained in the distributed SPARTA package. About the Name SPARTA In antiquity Sparta was a Dorian Greek military state, originally centered in Laconia. As a city-state devoted to military training, Sparta possessed the most formidable army in the Greek world and regarded itself as the natural protector of Greece. _[ Home ] [ NIH ] [ NIDDK ] [ Disclaimer ] [ Copyright ]_ _last updated: Apr 2007 / Webmaster_

Revision 704 Feb 2008 - Main.DavidCowburn

	SPARTA: Shifts Predicted from Analogy in Residue type and Torsion Angle – NYSBC notes As described in the paper: Protein backbone chemical shifts predicted from searching a database for torsion angle and sequence homology Yang Shen and Ad Bax LIBRARY:ShenBax08.pdf
Changed:
< <	---++ Local install – dl380://infotech/spartainstallPC
> >	Local install – dl380://infotech/spartainstallPC
	cygwin session ... Script started on Mon Feb 4 12:21:52 2008 >>Administrator@cowburn-pc #[33m/cygdrive/d/spartainstall/SPARTA ./src/sparta in test/ubiquitin.pdb Reading PDB Coordinates from test/ubiquitin.pdb Reading Random Coil Shifts from .\tab\randcoil.tab Reading RC Adjustments from .\tab\rcadj.tab Reading Previous Residue RC Adjustments from .\tab\rcprev.tab Reading Next Residue RC Adjustments from .\tab\rcnext.tab Reading Weighting Factors from .\tab\weight.tab Reading Residue Homology Table from .\tab\homology.tab Reading Fitting Parameter Table from .\tab\fitting.tab Reading .\tab\sparta.tab, 24166 Triplets Can't save file pred\test/ubiquitin_in.tab Analyzing test/ubiquitin.pdb 76 residues read Predicting ... N HA C CA CB H 124.353 5.462 175.920 55.080 30.759 8.947 2 Q test/ubiquitin 116.472 4.213 172.450 59.570 42.210 8.342 3 I test/ubiquitin 119.243 5.693 175.320 55.210 41.480 8.871 4 F test/ubiquitin 122.133 4.870 174.870 60.621 34.230 9.693 5 V test/ubiquitin 128.653 5.367 177.140 54.519 35.050 9.096 6 K test/ubiquitin 116.533 4.970 176.909 60.470 70.630 8.925 7 T test/ubiquitin 122.463 4.310 178.800 57.580 41.970 9.037 8 L test/ubiquitin 106.723 4.428 175.520 61.400 69.140 7.386 9 T test/ubiquitin 110.023 3.978 174.070 45.460 9999.000 7.522 10 G test/ubiquitin 122.734 4.361 175.940 56.280 33.200 6.915 11 K test/ubiquitin 121.573 5.264 174.320 62.390 69.910 8.627 12 T test/ubiquitin 128.243 4.545 175.220 59.980 40.950 9.852 13 I test/ubiquitin 122.653 5.067 173.789 61.940 69.650 8.696 14 T test/ubiquitin 125.933 4.779 174.670 52.830 47.070 8.760 15 L test/ubiquitin 123.293 5.045 175.860 54.820 29.450 8.177 16 E test/ubiquitin 118.342 4.713 174.160 58.431 36.400 9.226 17 V test/ubiquitin 120.123 5.078 176.161 52.720 30.310 8.723 18 E test/ubiquitin 139.146 4.141 175.310 65.470 31.950 9999.000 19 P test/ubiquitin 104.533 4.370 174.660 57.400 63.370 7.137 20 S test/ubiquitin 124.613 4.695 176.360 55.700 40.800 8.351 21 D test/ubiquitin 109.934 5.147 176.750 59.690 71.200 7.948 22 T test/ubiquitin 122.323 3.657 179.040 62.260 34.350 8.688 23 I test/ubiquitin 121.963 3.917 178.640 60.220 28.280 9.795 24 E test/ubiquitin 121.703 4.525 178.379 56.060 38.449 7.723 25 N test/ubiquitin 122.843 3.397 177.950 67.660 30.840 7.978 26 V test/ubiquitin 119.993 4.648 180.550 59.249 33.730 8.617 27 K test/ubiquitin 124.573 4.161 180.300 55.370 17.710 7.904 28 A test/ubiquitin 121.073 4.207 180.320 59.650 33.290 7.933 29 K test/ubiquitin 122.213 3.507 178.310 66.150 36.800 8.326 30 I test/ubiquitin 124.623 3.829 178.890 60.000 27.720 8.622 31 Q test/ubiquitin 120.493 4.354 177.250 57.190 40.580 8.231 32 D test/ubiquitin 116.263 4.337 177.870 58.050 34.170 7.521 33 K test/ubiquitin 115.003 4.625 177.840 55.170 32.661 8.995 34 E test/ubiquitin 109.782 4.035 173.960 46.080 9999.000 8.741 35 G test/ubiquitin 121.013 4.446 173.590 57.750 40.580 6.297 36 I test/ubiquitin 142.438 4.634 176.940 61.660 31.850 9999.000 37 P test/ubiquitin 139.608 4.117 178.320 66.260 32.890 9999.000 38 P test/ubiquitin 114.512 4.430 177.090 55.640 39.540 8.617 39 D test/ubiquitin 117.913 4.583 175.381 55.640 30.140 7.924 40 Q test/ubiquitin 118.853 4.244 176.300 56.470 31.650 7.307 41 Q test/ubiquitin 123.813 4.499 174.050 55.050 31.750 8.520 42 R test/ubiquitin 125.173 5.353 175.290 52.980 45.790 8.867 43 L test/ubiquitin 123.053 5.216 176.060 58.980 41.420 9.487 44 I test/ubiquitin 126.523 5.045 174.470 57.020 43.760 8.869 45 F test/ubiquitin 133.333 3.690 177.289 52.540 16.570 8.897 46 A test/ubiquitin 103.473 3.791 173.810 45.350 9999.000 8.087 47 G test/ubiquitin 122.702 4.623 174.700 54.550 34.530 8.284 48 K test/ubiquitin 123.543 4.666 175.670 55.740 29.000 8.667 49 Q test/ubiquitin 126.653 4.090 176.659 54.240 41.570 8.872 50 L test/ubiquitin 124.073 4.488 175.870 55.960 31.570 8.442 51 E test/ubiquitin 121.163 4.360 177.330 56.959 40.850 8.187 52 D test/ubiquitin 107.793 4.045 174.870 45.170 9999.000 9.567 53 G test/ubiquitin 120.183 4.695 175.350 54.390 32.650 7.288 54 R test/ubiquitin 109.533 5.508 176.560 59.690 72.260 8.799 55 T test/ubiquitin 119.053 4.060 180.810 58.710 40.370 8.176 56 L test/ubiquitin 114.463 4.370 178.310 61.080 62.530 8.585 57 S test/ubiquitin 125.323 4.296 177.400 57.180 40.100 7.654 58 D test/ubiquitin 116.642 4.670 174.700 58.250 40.070 7.124 59 Y test/ubiquitin 117.033 4.355 174.341 54.120 37.410 8.329 60 N test/ubiquitin 119.733 3.393 174.610 62.420 36.740 6.970 61 I test/ubiquitin 125.874 4.506 175.970 53.660 31.650 7.647 62 Q test/ubiquitin 121.433 4.001 175.810 57.791 32.649 8.487 63 K test/ubiquitin 115.083 3.465 175.250 57.890 25.900 9.591 64 E test/ubiquitin 115.863 4.640 172.160 60.890 64.910 7.383 65 S test/ubiquitin 118.242 5.614 173.950 62.340 70.080 8.737 66 T test/ubiquitin 128.243 5.060 175.770 53.900 44.260 9.801 67 L test/ubiquitin 119.513 5.292 173.150 55.000 30.531 9.633 68 H test/ubiquitin 125.592 5.282 175.270 53.890 44.380 8.533 69 L test/ubiquitin 128.073 4.351 173.999 60.800 34.910 9.490 70 V test/ubiquitin 124.262 5.361 177.830 53.940 42.851 8.067 71 L test/ubiquitin 124.244 4.921 174.953 54.777 32.225 9.169 72 R test/ubiquitin 128.176 4.628 176.270 54.090 42.511 8.881 73 L test/ubiquitin 124.343 4.706 175.048 54.919 31.176 8.588 74 R test/ubiquitin 112.599 4.156 173.001 44.721 9999.000 8.348 75 G test/ubiquitin Running time: 20.343 seconds >>/cygdrive/d/spartainstall/SPARTA# Using a standard MS shell with the infotech drive mounted .. src\sparta -in test\ubiquitin.pdb ... Original text -- Contact: shenyang@niddk.nih.gov; bax@nih.gov Web: http://spin.niddk.nih.gov/bax DOWNLOAD [[http://spin.niddk.nih.gov/bax/software/SPARTA/sparta.linux.tar.Z][RedHat Linux /Fedora Core version ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/sparta.win32.zip][Win32 version]] The download unix archive can be unpacked with a command like the following: zcat sparta.linux.tar.Z \| tar xvf - The win32 archive can be unpacked with a traditional Windows zip software. Users are encouraged to email the author to be informed about updates and related software. [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#what%20is%20sparta][ What is SPARTA? ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#reliability][ Reliability of SPARTA ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#components][ Components of the SPARTA Package ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#how%20to%20use][ How to Use SPARTA ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#PDB%20coordinates][ Preparing the PDB Coordinates ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#add%20new%20proteins][ Adding New Proteins to the Database ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#compile][ Compile the Source Code ]] [[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#about%20name][ About the Name SPARTA ]] What is SPARTA? SPARTA is a database system for empirical prediction of backbone chemical shifts (N, HN, HA, CA, CB, CO) using a combination of backbone phi, psi torsion angles and sidechain chi1 angles from a given protein with known PDB coordinates. The SPARTA approach is an extension of the well-known observation that many kinds of secondary chemical shifts (i.e. differences between chemical shifts and their corresponding random coil values) are highly correlated with aspects of protein secondary structure. The goal of SPARTA is to use phi, psi, chi1 torsion angles and sequence information from proteins structure in order to make quantitative predictions for the backbone chemical shifts SPARTA uses the phi, psi and chi1 angles of a given residue to predict secondary shifts for that residue. SPARTA also includes the information from the next and previous residues when making predictions for a given residue. So, in practice, SPARTA uses data for three consecutive residues simultaneously (i.e. 9 torsion angles and 3 residue types) to make predictions for the central residue in a triplet. The idea behind SPARTA is that if one can find some triplet of residues in a protein of known structure with similar structure and sequence to a triplet in a target protein, then the backbone secondary chemical shifts for this protein will be useful predictors for the backbone secondary chemical shifts in the target. The similarity is measured with a score based on the weighted sum of squares differences between the torsion angles in the target protein and the database entries, so that lower scores indicated high similarity. In order to take advantage of the correlations between residue type and secondary structure, the score also includes a small, qualitative residue type term which biases the matching towards roughly similar sequences. In practice, SPARTA searches a database for the 20 best matches to a given triplet in the target protein. The weighted averages chemical chemical shifts (obtained by subtracting their corresponding random coil chemical shifts values and the adjustments values arising from the effects of neighboring residues) of the central residues of these 20 matches are used as a prediction for the secondary shift of the central residue. The SPARTA database was constructed using the most well-defined parts of high resolution (2.4 Angstroms or better) X-ray crystal structures to define the phi, psi and chi1 angles, as well as other structural information, such as hydrogen bonding and ring current shifts, which would be used to quantitatively correct the raw predicted shifts from database searching. This database currently includes data from 200 proteins, representing 24,166 triplets. Reliability of SPARTA The reliability of the SPARTA approach was tested by a cross-validation procedure where each protein was temporarily removed from the database, and its backbone chemical shifts (N, HN, HA, CA, CB and C’) were predicted using the remaining protein data. The RMS deviations between the predicted and experimental shifts are 2.36, 0.46, 0.25, 0.88, 0.97 and 1.01 ppm, respectively. The same shifts prediction accuracies are also obtained for the proteins with known structures which are not contained in the database. Importantly, it is also found in the test that the standard deviation the shifts from the central residues of the 20 matches are correlated with the shifts prediction errors. By checking the standard deviations in the prediction summary file (pred/pred.tab) will provide an idea of the prediction reliability. It should be noted that the global structural information, such as ring current shifts and hydrogen bonding, was also carefully considered in SPARTA. The secondary shifts in SPARTA database are actually the corrected shifts using the ring current shifts. As “compensation”, the SPARTA predicted shifts for target protein are also corrected by adding the calculated ring current shifts from target protein. For HA and HN, the predicted secondary shifts are also corrected by using the hydrogen bond length and their relationship with the prediction errors, which were derived from above cross-validation. Therefore, the accuracy of the coordinates of the target protein is critical to obtain the reliable hydrogen bond information and ring current shifts, and the final predicted shifts. The calculated hydrogen bond and ring current shifts information is stored in the input summary file (/pred/protein_in.tab). It should also be noted that the protein backbone chemical shifts are extremely sensitive to the local conformation; therefore, SPARTA results for the residues in the flexible region or the with very large ring current shifts contribution may be less reliable, which was also indicated by the test. Components of the SPARTA Package The SPARTA system is implemented using C++. The compiled executable files ($SPARTA_DIR/src/SPARTA for Linux, $SPARTA_DIR/src/SPARTA.exe for Windows) or starting script ("$SPARTA_DIR/sparta" for Linux) can be invoked with "TALOS-like" command-line argument. A complete list of options can be invoked and generated with a "-help" command-line argument or simply typing in the executive files or starting script without any command-line arguments. Running SPARTA requires definition of the environment variables " SPARTA_DIR "; this will be established automatically by the starting script ("$SPARTA_DIR/sparta" in Linux): setenv SPARTA_DIR /disk1/SPARTA $SPARTA_DIR/src/SPARTA $argv[1-$#argv] Note that the default "$SPARTA_DIR" is the current directory if not specified. Other files of the SPARTA package include: $SPARTA_DIR/tab/sparta.tab The compiled database of residue triplets with their corresponding PHI/PSI/CHI1 angles and secondary shifts. $SPARTA_DIR/tab/randcoil.tab, rcadj.tab, rcprev.tab, rcnext.tab The table of random coil shifts, adjustments values from neighboring residues used in the shifts prediction process. (The same tables as used in TALOS, http://spin.niddk.nih.gov/NMRPipe/talos/) $SPARTA_DIR/tab/homology.tab The residue type homology factors used in the prediction process, which is similar to the table used by TALOS. $SPARTA_DIR.tab/weight.tab The weighting factors of PHI, PSI and CHI1 angles, and residue type homology used in the prediction process. $SPARTA_DIR.tab/fitting.tab The fitting parameters between prediction accuracy and precision, which will be used after the prediction process to calculate the estimated prediction error. $SPARTA_DIR/shifts/.tab* The files in this directory are only used when compiling a new database. When compiling a new database, only shift tables ending with the ".tab" extension will be used. The files in this directory are the chemical shift tables for the proteins in the database, which are in the same format as the TALOS shifts tables and must be exactly consistent with the corresponding structures in the SPARTA pdb directory. $SPARTA_DIR/pdb/.pdb* The PDB coordinates files in this directory are only used along with the files in the SPARTA shifts directory when compiling a new database (e.g. adding new proteins into the database). The sequence and residue numbering must be exactly consistent with the corresponding assignments in the SPARTA shifts directory. Furthermore, the names of these files must be exactly consistent with the corresponding chemical shift tables in the SPARTA shifts directory. $SPARTA_DIR/test/* The contents of this "test" directory include the input files and results for a sample SPARTA analysis. How to Use SPARTA Use of SPARTA to predict backbone chemical shifts involves the following steps: Create a directory for the prediction session; all subsequent commands will be executed from this directory. Prepare the input PDB coordinate file (for example "protein.pdb"), according to the format given above. Run SPARTA ("$SPARTA_DIR/src/SPARTA" or "$SPARTA_DIR/sparta" in Linux, "$SPARTA_DIR/src/SPARTA.exe" in Windows) to perform the database searches. Most commonly, this will simply require a command such as: sparta -in protein.pdb SPARTA will first generate an input "pred/protein_in.tab" file from PDB coordinates, which contains of the phi, psi, chi1 angles, H-bonding information and ring current shifts. During the database search, a series of files "pred/X/res.tab" (X = N, H, HA, CA, CB and C) will be created. Each one of these files tallies the 20 best database matches for a given residue in the target protein. Before exiting, a file "pred.tab" will also be created in "pred" directory, which includes a summary of the prediction results. The database search will typically take about 25 sec for a 100-residue protein on a Linux PC with a 2.8GHz CPU. If experimental chemical shifts for target protein are available (with a name "ref.tab", for example, and the same format as typical TALOS shift table file, http://spin.niddk.nih.gov/NMRPipe/talos/), SAELDI prediction can be performed by a command such as: sparta -in protein.pdb -ref ref.tab SPARTA would compare the predicted chemical shifts and experimental shifts before exiting, and a prediction summary file "pred/pred.tab" will be generated to store the comparison between the reference and predicted shifts, as well as the errors. If the average prediction error larger than 3 times of the expected errors (standard deviation of prediction errors / square root of number of shifts), a warning is printed and a reference correction will be applied to the experimental chemical shifts. The corrected reference chemical shifts are stored into a new file "pred/ref.tab" Preparing the Input PDB Coordinates The input PDB coordinates should be prepared carefully, so that it has the proper format, naming conventions. SPARTA accept the standard PDB coordinates file, but ONLY the FIRST* conformer/chain if more than one exist. For PDB coordinates without hydrogen atoms, the hydrogen atoms are required to be added (by using programs DYNAMO, REDUCE, MOLMOL, or any other similar programs) in order to get the hydrogen bonding information and ring current shifts. For HA atoms of Gly, please use atom names of "HA1/HA2" Examples of the required PDB coordinate format can be found in the "$SPARTA_DIR/pdb" and "$SPARTA_DIR/test" directories. Adding New Proteins to the Database New protein chemical shift and structure data can be added to the database. Note well that this should be done with great care and caution, to ensure that only reliable phi/psi/chi1 data with consistently referenced and correct chemical shifts are included. It suggests that The chemical shifts assignments for each candidate protein are better validated by conducting a SPARTA shift prediction using its PDB coordinates. sparta -in protein.pdb -ref ref.tab Check the prediction summary table (pred/pred.tab) files, remove the experimental shifts for which the predicted shifts deviated five standard deviations. Notably, HAs, for which ring current shifts are > 1.5ppm and the predicted shifts deviate the three standard deviations, are better removed. Chemical shifts shoule be referenced correctly. A quick check can be conduct by runing above SPARTA prediction for this protein and inspecting the average SPARTA prediction errors, which are listed in the header of prediction summary table (pred/pred.tab). By default, SPARTA will apply a shift referencing correction if the average prediction error is larger than 3 times expected error (i.e., standard deviation of prediction errors / square root of number of shifts), and store the corrected shifts in a file "pred/ref.tab" Given this, the procedure for adding new proteins to the SPARTA database is simple as: Create a chemical shift table for the new protein according to the TALOS format (http://spin.niddk.nih.gov/NMRPipe/talos/). Copy the table to the "$SPARTA_DIR/shifts" directory; it must have a ".tab" extension in order to be used. Place the corresponding PDB structure file into the "$SPARTA_DIR/pdb" directory; it must have a ".pdb" extension, and its file name, sequence, and residue numbering must correspond exactly with the shift table. Prepare a table file, for example with a name of "list.tab", which only contains the names of proteins to be added into the database. This table must follow the example below: VARS PDB_NAME FORMAT %24s bpti ubiquitin profilin ... Note that the "PDB_NAME" in the table file must consistent with the files names (with ".tab" and ".pdb" extension) in the SPARTA pdb and shifts directories. In the "SPARTA" directory, execute the following command to compile a new database: sparta -compile -pdbDir ./pdb -pdbList list.tab A new database "$SPARTA_DIR/tab/sparta.tab" will be generated from the files in SPARTA pdb and shifts directories. Please backup the old database, which will be overwritten. Compile the Source Code SPARTA was implemented with standard C++ using Standard Template Library (STL). To compile the source codes (in /src directory), your system must have a compatible C++ compiler and STL library. Given this, the compiling of SPARTA executable file is simple as: cd $SPARTA_DIR/src make The compiling of the SPARTA program has been tested on Windows (XP) and Linux (Linux 9 or newer). The compiled executable files ("$SPARTA_DIR/src/SPARTA" for Linux, or "$SPARTA_DIR/src/SPARTA.exe" for Windows) are contained in the distributed SPARTA package. About the Name SPARTA In antiquity Sparta was a Dorian Greek military state, originally centered in Laconia. As a city-state devoted to military training, Sparta possessed the most formidable army in the Greek world and regarded itself as the natural protector of Greece. _[ Home ] [ NIH ] [ NIDDK ] [ Disclaimer ] [ Copyright ]_ _last updated: Apr 2007 / Webmaster_

Revision 604 Feb 2008 - Main.DavidCowburn

  SPARTA: Shifts Predicted from Analogy in Residue type and Torsion Angle – NYSBC notes 

As described in the paper:

Protein backbone chemical
shifts predicted from searching a database for torsion angle and
sequence homology 

Yang Shen and Ad Bax 

LIBRARY:ShenBax08.pdf
-<
<
-<
<
->
>
+ ---++ Local install – dl380://infotech/spartainstallPC
-<
<
+ ---++Local install – dl380://infotech/spartainstallPC
 cygwin session ...
Script started on Mon Feb 
4 12:21:52 2008


>>Administrator@cowburn-pc
#[33m/cygdrive/d/spartainstall/SPARTA
 ./src/sparta in
test/ubiquitin.pdb 




Reading PDB Coordinates
from test/ubiquitin.pdb
Reading Random Coil Shifts
from .\tab\randcoil.tab

Reading RC Adjustments from
.\tab\rcadj.tab
Reading Previous Residue RC
Adjustments from .\tab\rcprev.tab
Reading Next Residue RC
Adjustments from .\tab\rcnext.tab
Reading Weighting Factors
from .\tab\weight.tab
Reading Residue Homology
Table from .\tab\homology.tab
Reading Fitting Parameter
Table from .\tab\fitting.tab
Reading .\tab\sparta.tab,
24166 Triplets
 Can't save file
pred\test/ubiquitin_in.tab

Analyzing
test/ubiquitin.pdb 76 residues read 

Predicting ...
       N       HA        C 
     CA       CB        H 

 124.353    5.462  175.920 
 55.080   30.759    8.947    2    Q test/ubiquitin
 116.472    4.213  172.450 
 59.570   42.210    8.342    3    I test/ubiquitin
 119.243    5.693  175.320 
 55.210   41.480    8.871    4    F test/ubiquitin

 122.133    4.870  174.870 
 60.621   34.230    9.693    5    V test/ubiquitin
 128.653    5.367  177.140 
 54.519   35.050    9.096    6    K test/ubiquitin
 116.533    4.970  176.909 
 60.470   70.630    8.925    7    T test/ubiquitin
 122.463    4.310  178.800 
 57.580   41.970    9.037    8    L test/ubiquitin
 106.723    4.428  175.520 
 61.400   69.140    7.386    9    T test/ubiquitin
 110.023    3.978  174.070 
 45.460 9999.000    7.522   10    G test/ubiquitin

 122.734    4.361  175.940 
 56.280   33.200    6.915   11    K test/ubiquitin
 121.573    5.264  174.320 
 62.390   69.910    8.627   12    T test/ubiquitin
 128.243    4.545  175.220 
 59.980   40.950    9.852   13    I test/ubiquitin
 122.653    5.067  173.789 
 61.940   69.650    8.696   14    T test/ubiquitin
 125.933    4.779  174.670 
 52.830   47.070    8.760   15    L test/ubiquitin
 123.293    5.045  175.860 
 54.820   29.450    8.177   16    E test/ubiquitin

 118.342    4.713  174.160 
 58.431   36.400    9.226   17    V test/ubiquitin
 120.123    5.078  176.161 
 52.720   30.310    8.723   18    E test/ubiquitin
 139.146    4.141  175.310 
 65.470   31.950 9999.000   19    P test/ubiquitin
 104.533    4.370  174.660 
 57.400   63.370    7.137   20    S test/ubiquitin
 124.613    4.695  176.360 
 55.700   40.800    8.351   21    D test/ubiquitin
 109.934    5.147  176.750 
 59.690   71.200    7.948   22    T test/ubiquitin

 122.323    3.657  179.040 
 62.260   34.350    8.688   23    I test/ubiquitin
 121.963    3.917  178.640 
 60.220   28.280    9.795   24    E test/ubiquitin
 121.703    4.525  178.379 
 56.060   38.449    7.723   25    N test/ubiquitin
 122.843    3.397  177.950 
 67.660   30.840    7.978   26    V test/ubiquitin
 119.993    4.648  180.550 
 59.249   33.730    8.617   27    K test/ubiquitin
 124.573    4.161  180.300 
 55.370   17.710    7.904   28    A test/ubiquitin

 121.073    4.207  180.320 
 59.650   33.290    7.933   29    K test/ubiquitin
 122.213    3.507  178.310 
 66.150   36.800    8.326   30    I test/ubiquitin
 124.623    3.829  178.890 
 60.000   27.720    8.622   31    Q test/ubiquitin
 120.493    4.354  177.250 
 57.190   40.580    8.231   32    D test/ubiquitin
 116.263    4.337  177.870 
 58.050   34.170    7.521   33    K test/ubiquitin
 115.003    4.625  177.840 
 55.170   32.661    8.995   34    E test/ubiquitin

 109.782    4.035  173.960 
 46.080 9999.000    8.741   35    G test/ubiquitin
 121.013    4.446  173.590 
 57.750   40.580    6.297   36    I test/ubiquitin
 142.438    4.634  176.940 
 61.660   31.850 9999.000   37    P test/ubiquitin
 139.608    4.117  178.320 
 66.260   32.890 9999.000   38    P test/ubiquitin
 114.512    4.430  177.090 
 55.640   39.540    8.617   39    D test/ubiquitin
 117.913    4.583  175.381 
 55.640   30.140    7.924   40    Q test/ubiquitin

 118.853    4.244  176.300 
 56.470   31.650    7.307   41    Q test/ubiquitin
 123.813    4.499  174.050 
 55.050   31.750    8.520   42    R test/ubiquitin
 125.173    5.353  175.290 
 52.980   45.790    8.867   43    L test/ubiquitin
 123.053    5.216  176.060 
 58.980   41.420    9.487   44    I test/ubiquitin
 126.523    5.045  174.470 
 57.020   43.760    8.869   45    F test/ubiquitin
 133.333    3.690  177.289 
 52.540   16.570    8.897   46    A test/ubiquitin

 103.473    3.791  173.810 
 45.350 9999.000    8.087   47    G test/ubiquitin
 122.702    4.623  174.700 
 54.550   34.530    8.284   48    K test/ubiquitin
 123.543    4.666  175.670 
 55.740   29.000    8.667   49    Q test/ubiquitin
 126.653    4.090  176.659 
 54.240   41.570    8.872   50    L test/ubiquitin
 124.073    4.488  175.870 
 55.960   31.570    8.442   51    E test/ubiquitin
 121.163    4.360  177.330 
 56.959   40.850    8.187   52    D test/ubiquitin

 107.793    4.045  174.870 
 45.170 9999.000    9.567   53    G test/ubiquitin
 120.183    4.695  175.350 
 54.390   32.650    7.288   54    R test/ubiquitin
 109.533    5.508  176.560 
 59.690   72.260    8.799   55    T test/ubiquitin
 119.053    4.060  180.810 
 58.710   40.370    8.176   56    L test/ubiquitin
 114.463    4.370  178.310 
 61.080   62.530    8.585   57    S test/ubiquitin
 125.323    4.296  177.400 
 57.180   40.100    7.654   58    D test/ubiquitin

 116.642    4.670  174.700 
 58.250   40.070    7.124   59    Y test/ubiquitin
 117.033    4.355  174.341 
 54.120   37.410    8.329   60    N test/ubiquitin
 119.733    3.393  174.610 
 62.420   36.740    6.970   61    I test/ubiquitin
 125.874    4.506  175.970 
 53.660   31.650    7.647   62    Q test/ubiquitin
 121.433    4.001  175.810 
 57.791   32.649    8.487   63    K test/ubiquitin
 115.083    3.465  175.250 
 57.890   25.900    9.591   64    E test/ubiquitin

 115.863    4.640  172.160 
 60.890   64.910    7.383   65    S test/ubiquitin
 118.242    5.614  173.950 
 62.340   70.080    8.737   66    T test/ubiquitin
 128.243    5.060  175.770 
 53.900   44.260    9.801   67    L test/ubiquitin
 119.513    5.292  173.150 
 55.000   30.531    9.633   68    H test/ubiquitin
 125.592    5.282  175.270 
 53.890   44.380    8.533   69    L test/ubiquitin
 128.073    4.351  173.999 
 60.800   34.910    9.490   70    V test/ubiquitin

 124.262    5.361  177.830 
 53.940   42.851    8.067   71    L test/ubiquitin
 124.244    4.921  174.953 
 54.777   32.225    9.169   72    R test/ubiquitin
 128.176    4.628  176.270 
 54.090   42.511    8.881   73    L test/ubiquitin
 124.343    4.706  175.048 
 54.919   31.176    8.588   74    R test/ubiquitin
 112.599    4.156  173.001 
 44.721 9999.000    8.348   75    G test/ubiquitin




 Running time: 20.343
seconds

>>/cygdrive/d/spartainstall/SPARTA#



Using a standard MS shell
with the infotech drive mounted ..



src\sparta -in
test\ubiquitin.pdb ...




Original
text -- 
 

Contact:     
shenyang@niddk.nih.gov;
bax@nih.gov 
Web:     
http://spin.niddk.nih.gov/bax

 DOWNLOAD 

[[http://spin.niddk.nih.gov/bax/software/SPARTA/sparta.linux.tar.Z][RedHat
Linux /Fedora Core version ]]
[[http://spin.niddk.nih.gov/bax/software/SPARTA/sparta.win32.zip][Win32
version]]

The download unix archive can be
unpacked with a command like the following: 
   zcat sparta.linux.tar.Z | tar xvf -


The win32 archive can be unpacked with
a traditional Windows zip software. 

Users are encouraged to email the
author to be informed about updates and related software. 


[[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#what%20is%20sparta][ *What
is SPARTA?* ]] 
[[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#reliability][ *Reliability
of SPARTA* ]] 
[[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#components][ *Components
of the SPARTA Package* ]]
[[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#how%20to%20use][ *How
to Use SPARTA* ]] 
[[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#PDB%20coordinates][ *Preparing
the PDB Coordinates* ]]

[[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#add%20new%20proteins][ *Adding
New Proteins to the Database* ]]
[[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#compile][ *Compile
the Source Code* ]]
[[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#about%20name][ *About
the Name SPARTA* ]] 

 What 
is SPARTA?

SPARTA is a database system for
empirical prediction of backbone chemical shifts (N, HN, HA, CA, CB,
CO) using a combination of backbone phi, psi torsion angles and
sidechain chi1 angles from a given protein with known PDB
coordinates. The SPARTA approach is an extension of the well-known
observation that many kinds of secondary chemical shifts (i.e.
differences between chemical shifts and their corresponding random
coil values) are highly correlated with aspects of protein secondary
structure. The goal of SPARTA is to use phi, psi, chi1 torsion angles
and sequence information from proteins structure in order to make
quantitative predictions for the backbone chemical shifts 

SPARTA uses the phi, psi and chi1
angles of a given residue to predict secondary shifts for that
residue. SPARTA also includes the information from the next and
previous residues when making predictions for a given residue. So, in
practice, SPARTA uses data for three consecutive residues
simultaneously (i.e. 9 torsion angles and 3 residue types) to make
predictions for the central residue in a triplet. 

The idea behind SPARTA is that if
one can find some triplet of residues in a protein of known structure
with similar structure and sequence to a triplet in a target protein,
then the backbone secondary chemical shifts for this protein will be
useful predictors for the backbone secondary chemical shifts in the
target. 

The similarity is measured with a
score based on the weighted sum of squares differences between the
torsion angles in the target protein and the database entries, so
that lower scores indicated high similarity. In order to take
advantage of the correlations between residue type and secondary
structure, the score also includes a small, qualitative residue type
term which biases the matching towards roughly similar sequences. 

In practice, SPARTA searches a
database for the 20 best matches to a given triplet in the target
protein. The weighted averages chemical chemical shifts (obtained by
subtracting their corresponding random coil chemical shifts values
and the adjustments values arising from the effects of neighboring
residues) of the central residues of these 20 matches are used as a
prediction for the secondary shift of the central residue. The SPARTA
database was constructed using the most well-defined parts of high
resolution (2.4 Angstroms or better) X-ray crystal structures to
define the phi, psi and chi1 angles, as well as other structural
information, such as hydrogen bonding and ring current shifts, which
would be used to quantitatively correct the raw predicted shifts from
database searching. This database currently includes data from 200
proteins, representing 24,166 triplets. 

 Reliability 
of SPARTA

The reliability of the SPARTA
approach was tested by a cross-validation procedure where each
protein was temporarily removed from the database, and its backbone
chemical shifts (N, HN, HA, CA, CB and C’) were predicted using
the remaining protein data. The RMS deviations between the predicted
and experimental shifts are 2.36, 0.46, 0.25, 0.88, 0.97 and 1.01
ppm, respectively. The same shifts prediction accuracies are also
obtained for the proteins with known structures which are not
contained in the database. 

Importantly, it is also found in the
test that the standard deviation the shifts from the central residues
of the 20 matches are correlated with the shifts prediction errors.
By checking the standard deviations in the prediction summary file
(pred/pred.tab) will provide an idea of the prediction reliability. 

It should be noted that the global
structural information, such as ring current shifts and hydrogen
bonding, was also carefully considered in SPARTA. The secondary
shifts in SPARTA database are actually the corrected shifts using the
ring current shifts. As “compensation”, the SPARTA
predicted shifts for target protein are also corrected by adding the
calculated ring current shifts from target protein. For HA and HN,
the predicted secondary shifts are also corrected by using the
hydrogen bond length and their relationship with the prediction
errors, which were derived from above cross-validation. Therefore,
the accuracy of the coordinates of the target protein is critical to
obtain the reliable hydrogen bond information and ring current
shifts, and the final predicted shifts. The calculated hydrogen bond
and ring current shifts information is stored in the input summary
file (/pred/protein_in.tab). 

It should also be noted that the
protein backbone chemical shifts are extremely sensitive to the local
conformation; therefore, SPARTA results for the residues in the
flexible region or the with very large ring current shifts
contribution may be less reliable, which was also indicated by the
test. 

 Components 
of the SPARTA Package

The SPARTA system is implemented
using C++. The compiled executable files ($SPARTA_DIR/src/SPARTA for
Linux, $SPARTA_DIR/src/SPARTA.exe for Windows) or starting script
("$SPARTA_DIR/sparta" for Linux) can be invoked with
"TALOS-like" command-line argument. A complete list of
options can be invoked and generated with a "-help"

command-line argument or simply typing in the executive files or
starting script without any command-line arguments. 

Running SPARTA requires
definition of the environment variables " SPARTA_DIR ";
this will be established automatically by the starting script
("$SPARTA_DIR/sparta" in Linux): 
setenv SPARTA_DIR /disk1/SPARTA

$SPARTA_DIR/src/SPARTA $argv[1-$#argv]


Note that the default
"$SPARTA_DIR" is the current directory if not specified. 

Other files of the SPARTA package
include: 
 $SPARTA_DIR/tab/sparta.tab
The
compiled database of residue triplets with their corresponding
PHI/PSI/CHI1 angles and secondary shifts. 


 *$SPARTA_DIR/tab/randcoil.tab,
rcadj.tab, rcprev.tab, rcnext.tab*
The
table of random coil shifts, adjustments values from neighboring
residues used in the shifts prediction process. (The same tables as
used in TALOS, http://spin.niddk.nih.gov/NMRPipe/talos/)

 $SPARTA_DIR/tab/homology.tab
The
residue type homology factors used in the prediction process, which
is similar to the table used by TALOS. 

 $SPARTA_DIR.tab/weight.tab
The
weighting factors of PHI, PSI and CHI1 angles, and residue type
homology used in the prediction process. 

 $SPARTA_DIR.tab/fitting.tab
The
fitting parameters between prediction accuracy and precision, which
will be used after the prediction process to calculate the estimated
prediction error. 


 $SPARTA_DIR/shifts/.tab*
The
files in this directory are only used when compiling a new database.
When compiling a new database, only shift tables ending with the
".tab" extension will be used. The files in this directory
are the chemical shift tables for the proteins in the database, which
are in the same format as the TALOS shifts tables and must be exactly
consistent with the corresponding structures in the SPARTA pdb
directory. 

 $SPARTA_DIR/pdb/.pdb*
The
PDB coordinates files in this directory are only used along with the
files in the SPARTA shifts directory when compiling a new database
(e.g. adding new proteins into the database). The sequence and
residue numbering must be exactly consistent with the corresponding
assignments in the SPARTA shifts directory. Furthermore, the names of
these files must be exactly consistent with the corresponding
chemical shift tables in the SPARTA shifts directory. 

 $SPARTA_DIR/test/*
The
contents of this "test" directory include the input files
and results for a sample SPARTA analysis. 



 How to Use 
SPARTA

Use of SPARTA to predict backbone
chemical shifts involves the following steps:  

 Create a directory for the  prediction session; all subsequent commands will be executed from  this directory.
  Prepare the input PDB  coordinate file (for example "protein.pdb"), according to  the format given above.

  Run SPARTA  ("$SPARTA_DIR/src/SPARTA" or "$SPARTA_DIR/sparta"  in Linux, "$SPARTA_DIR/src/SPARTA.exe" in Windows) to  perform the database searches. Most commonly, this will simply  require a command such as:
sparta -in protein.pdb
  SPARTA will first generate an input  "pred/protein_in.tab" file from PDB coordinates, which  contains of the phi, psi, chi1 angles, H-bonding information and  ring current shifts. During the database search, a series of files  "pred/X/res*.tab" (X = N, H, HA, CA, CB and C) will be  created. Each one of these files tallies the 20 best database  matches for a given residue in the target protein. Before exiting, a  file "pred.tab" will also be created in "pred"  directory, which includes a summary of the prediction results. The  database search will typically take about 25 sec for a 100-residue  protein on a Linux PC with a 2.8GHz CPU.

  If experimental chemical shifts  for target protein are available (with a name "ref.tab",  for example, and the same format as typical TALOS shift table file,  http://spin.niddk.nih.gov/NMRPipe/talos/),  SAELDI prediction can be performed by a command such as:
sparta -in protein.pdb -ref ref.tab
  SPARTA would compare the predicted  chemical shifts and experimental shifts before exiting, and a  prediction summary file "pred/pred.tab" will be generated  to store the comparison between the reference and predicted shifts,  as well as the errors. If the average prediction error larger than 3  times of the expected errors (standard deviation of prediction  errors / square root of number of shifts), a warning is printed and  a reference correction will be applied to the experimental chemical  shifts. The corrected reference chemical shifts are stored into a  new file "pred/ref.tab"

 

 Preparing 
the Input PDB Coordinates

The input PDB coordinates should be
prepared carefully, so that it has the proper format, naming
conventions. SPARTA accept the standard PDB coordinates file, but
ONLY the FIRST conformer/chain if more than one exist. For PDB
coordinates without hydrogen atoms, the hydrogen atoms are required
to be added (by using programs DYNAMO, REDUCE, MOLMOL, or any other
similar programs) in order to get the hydrogen bonding information
and ring current shifts. For HA atoms of Gly, please use atom names
of "HA1/HA2" 

Examples of the required PDB
coordinate format can be found in the "$SPARTA_DIR/pdb" and

"$SPARTA_DIR/test" directories. 

 Adding 
New Proteins to the Database

New protein chemical shift and
structure data can be added to the database. Note well that this
should be done with great care and caution, to ensure that only
reliable phi/psi/chi1 data with consistently referenced and correct
chemical shifts are included. It suggests that  

 The chemical shifts assignments  for each candidate protein are better validated by conducting a  SPARTA shift prediction using its PDB coordinates.
sparta -in protein.pdb -ref ref.tab
  Check the prediction summary  table (pred/pred.tab) files, remove the experimental shifts for  which the predicted shifts deviated five standard deviations.  Notably, HAs, for which ring current shifts are > 1.5ppm and the  predicted shifts deviate the three standard deviations, are better  removed.
  Chemical shifts shoule be  referenced correctly. A quick check can be conduct by runing above  SPARTA prediction for this protein and inspecting the average SPARTA  prediction errors, which are listed in the header of prediction  summary table (pred/pred.tab). By default, SPARTA will apply a shift  referencing correction if the average prediction error is larger  than 3 times expected error (i.e., standard deviation of prediction  errors / square root of number of shifts), and store the corrected  shifts in a file "pred/ref.tab"

 

Given this, the procedure for adding
new proteins to the SPARTA database is simple as:  

 Create a chemical shift table  for the new protein according to the TALOS format  (http://spin.niddk.nih.gov/NMRPipe/talos/).  Copy the table to the "$SPARTA_DIR/shifts" directory; it  must have a ".tab" extension in order to be used.

  Place the corresponding PDB  structure file into the "$SPARTA_DIR/pdb" directory; it  must have a ".pdb" extension, and its file name, sequence,  and residue numbering must correspond exactly with the shift table.
  Prepare a table file, for  example with a name of "list.tab", which only contains the  names of proteins to be added into the database. This table must  follow the example below:
VARS   PDB_NAME FORMAT %24s bpti ubiquitin profilin ...
  Note that the "PDB_NAME" in  the table file must consistent with the files names (with ".tab"
  and ".pdb" extension) in the SPARTA pdb and shifts  directories.
  In the "SPARTA"  directory, execute the following command to compile a new database:
sparta -compile -pdbDir ./pdb -pdbList list.tab
  A new database  "$SPARTA_DIR/tab/sparta.tab" will be generated from the  files in SPARTA pdb and shifts directories. Please backup the old  database, which will be overwritten.
 

 Compile the 
Source Code

SPARTA was implemented with standard
C++ using Standard Template Library (STL). To compile the source
codes (in /src directory), your system must have a compatible C++
compiler and STL library. Given this, the compiling of SPARTA
executable file is simple as: 

cd $SPARTA_DIR/src
make


The compiling of the SPARTA program has
been tested on Windows (XP) and Linux (Linux 9 or newer). The
compiled executable files ("$SPARTA_DIR/src/SPARTA" for
Linux, or "$SPARTA_DIR/src/SPARTA.exe" for Windows) are
contained in the distributed SPARTA package. 

 About the 
Name SPARTA



In antiquity Sparta was a Dorian
Greek military state, originally centered in Laconia. As a city-state
devoted to military training, Sparta possessed the most formidable
army in the Greek world and regarded itself as the natural protector
of Greece. 


_[
Home ] [ NIH
] [ NIDDK ]
[ Disclaimer ]
[ Copyright ]_

_last
updated:  Apr 2007 / Webmaster_

Revision 504 Feb 2008 - Main.DavidCowburn

  SPARTA: Shifts Predicted from Analogy in Residue type and Torsion Angle – NYSBC notes 

As described in the paper:

Protein backbone chemical
shifts predicted from searching a database for torsion angle and
sequence homology 

Yang Shen and Ad Bax 

LIBRARY:ShenBax08.pdf
-<
<
+Local install –
->
>
+ ---++Local install – dl380://infotech/spartainstallPC
-<
<
+dl380://infotech/spartainstallPC
 cygwin session ...
Script started on Mon Feb 
4 12:21:52 2008


>>Administrator@cowburn-pc
#[33m/cygdrive/d/spartainstall/SPARTA
 ./src/sparta in
test/ubiquitin.pdb 




Reading PDB Coordinates
from test/ubiquitin.pdb
Reading Random Coil Shifts
from .\tab\randcoil.tab

Reading RC Adjustments from
.\tab\rcadj.tab
Reading Previous Residue RC
Adjustments from .\tab\rcprev.tab
Reading Next Residue RC
Adjustments from .\tab\rcnext.tab
Reading Weighting Factors
from .\tab\weight.tab
Reading Residue Homology
Table from .\tab\homology.tab
Reading Fitting Parameter
Table from .\tab\fitting.tab
Reading .\tab\sparta.tab,
24166 Triplets
 Can't save file
pred\test/ubiquitin_in.tab

Analyzing
test/ubiquitin.pdb 76 residues read 

Predicting ...
       N       HA        C 
     CA       CB        H 

 124.353    5.462  175.920 
 55.080   30.759    8.947    2    Q test/ubiquitin
 116.472    4.213  172.450 
 59.570   42.210    8.342    3    I test/ubiquitin
 119.243    5.693  175.320 
 55.210   41.480    8.871    4    F test/ubiquitin

 122.133    4.870  174.870 
 60.621   34.230    9.693    5    V test/ubiquitin
 128.653    5.367  177.140 
 54.519   35.050    9.096    6    K test/ubiquitin
 116.533    4.970  176.909 
 60.470   70.630    8.925    7    T test/ubiquitin
 122.463    4.310  178.800 
 57.580   41.970    9.037    8    L test/ubiquitin
 106.723    4.428  175.520 
 61.400   69.140    7.386    9    T test/ubiquitin
 110.023    3.978  174.070 
 45.460 9999.000    7.522   10    G test/ubiquitin

 122.734    4.361  175.940 
 56.280   33.200    6.915   11    K test/ubiquitin
 121.573    5.264  174.320 
 62.390   69.910    8.627   12    T test/ubiquitin
 128.243    4.545  175.220 
 59.980   40.950    9.852   13    I test/ubiquitin
 122.653    5.067  173.789 
 61.940   69.650    8.696   14    T test/ubiquitin
 125.933    4.779  174.670 
 52.830   47.070    8.760   15    L test/ubiquitin
 123.293    5.045  175.860 
 54.820   29.450    8.177   16    E test/ubiquitin

 118.342    4.713  174.160 
 58.431   36.400    9.226   17    V test/ubiquitin
 120.123    5.078  176.161 
 52.720   30.310    8.723   18    E test/ubiquitin
 139.146    4.141  175.310 
 65.470   31.950 9999.000   19    P test/ubiquitin
 104.533    4.370  174.660 
 57.400   63.370    7.137   20    S test/ubiquitin
 124.613    4.695  176.360 
 55.700   40.800    8.351   21    D test/ubiquitin
 109.934    5.147  176.750 
 59.690   71.200    7.948   22    T test/ubiquitin

 122.323    3.657  179.040 
 62.260   34.350    8.688   23    I test/ubiquitin
 121.963    3.917  178.640 
 60.220   28.280    9.795   24    E test/ubiquitin
 121.703    4.525  178.379 
 56.060   38.449    7.723   25    N test/ubiquitin
 122.843    3.397  177.950 
 67.660   30.840    7.978   26    V test/ubiquitin
 119.993    4.648  180.550 
 59.249   33.730    8.617   27    K test/ubiquitin
 124.573    4.161  180.300 
 55.370   17.710    7.904   28    A test/ubiquitin

 121.073    4.207  180.320 
 59.650   33.290    7.933   29    K test/ubiquitin
 122.213    3.507  178.310 
 66.150   36.800    8.326   30    I test/ubiquitin
 124.623    3.829  178.890 
 60.000   27.720    8.622   31    Q test/ubiquitin
 120.493    4.354  177.250 
 57.190   40.580    8.231   32    D test/ubiquitin
 116.263    4.337  177.870 
 58.050   34.170    7.521   33    K test/ubiquitin
 115.003    4.625  177.840 
 55.170   32.661    8.995   34    E test/ubiquitin

 109.782    4.035  173.960 
 46.080 9999.000    8.741   35    G test/ubiquitin
 121.013    4.446  173.590 
 57.750   40.580    6.297   36    I test/ubiquitin
 142.438    4.634  176.940 
 61.660   31.850 9999.000   37    P test/ubiquitin
 139.608    4.117  178.320 
 66.260   32.890 9999.000   38    P test/ubiquitin
 114.512    4.430  177.090 
 55.640   39.540    8.617   39    D test/ubiquitin
 117.913    4.583  175.381 
 55.640   30.140    7.924   40    Q test/ubiquitin

 118.853    4.244  176.300 
 56.470   31.650    7.307   41    Q test/ubiquitin
 123.813    4.499  174.050 
 55.050   31.750    8.520   42    R test/ubiquitin
 125.173    5.353  175.290 
 52.980   45.790    8.867   43    L test/ubiquitin
 123.053    5.216  176.060 
 58.980   41.420    9.487   44    I test/ubiquitin
 126.523    5.045  174.470 
 57.020   43.760    8.869   45    F test/ubiquitin
 133.333    3.690  177.289 
 52.540   16.570    8.897   46    A test/ubiquitin

 103.473    3.791  173.810 
 45.350 9999.000    8.087   47    G test/ubiquitin
 122.702    4.623  174.700 
 54.550   34.530    8.284   48    K test/ubiquitin
 123.543    4.666  175.670 
 55.740   29.000    8.667   49    Q test/ubiquitin
 126.653    4.090  176.659 
 54.240   41.570    8.872   50    L test/ubiquitin
 124.073    4.488  175.870 
 55.960   31.570    8.442   51    E test/ubiquitin
 121.163    4.360  177.330 
 56.959   40.850    8.187   52    D test/ubiquitin

 107.793    4.045  174.870 
 45.170 9999.000    9.567   53    G test/ubiquitin
 120.183    4.695  175.350 
 54.390   32.650    7.288   54    R test/ubiquitin
 109.533    5.508  176.560 
 59.690   72.260    8.799   55    T test/ubiquitin
 119.053    4.060  180.810 
 58.710   40.370    8.176   56    L test/ubiquitin
 114.463    4.370  178.310 
 61.080   62.530    8.585   57    S test/ubiquitin
 125.323    4.296  177.400 
 57.180   40.100    7.654   58    D test/ubiquitin

 116.642    4.670  174.700 
 58.250   40.070    7.124   59    Y test/ubiquitin
 117.033    4.355  174.341 
 54.120   37.410    8.329   60    N test/ubiquitin
 119.733    3.393  174.610 
 62.420   36.740    6.970   61    I test/ubiquitin
 125.874    4.506  175.970 
 53.660   31.650    7.647   62    Q test/ubiquitin
 121.433    4.001  175.810 
 57.791   32.649    8.487   63    K test/ubiquitin
 115.083    3.465  175.250 
 57.890   25.900    9.591   64    E test/ubiquitin

 115.863    4.640  172.160 
 60.890   64.910    7.383   65    S test/ubiquitin
 118.242    5.614  173.950 
 62.340   70.080    8.737   66    T test/ubiquitin
 128.243    5.060  175.770 
 53.900   44.260    9.801   67    L test/ubiquitin
 119.513    5.292  173.150 
 55.000   30.531    9.633   68    H test/ubiquitin
 125.592    5.282  175.270 
 53.890   44.380    8.533   69    L test/ubiquitin
 128.073    4.351  173.999 
 60.800   34.910    9.490   70    V test/ubiquitin

 124.262    5.361  177.830 
 53.940   42.851    8.067   71    L test/ubiquitin
 124.244    4.921  174.953 
 54.777   32.225    9.169   72    R test/ubiquitin
 128.176    4.628  176.270 
 54.090   42.511    8.881   73    L test/ubiquitin
 124.343    4.706  175.048 
 54.919   31.176    8.588   74    R test/ubiquitin
 112.599    4.156  173.001 
 44.721 9999.000    8.348   75    G test/ubiquitin




 Running time: 20.343
seconds

>>/cygdrive/d/spartainstall/SPARTA#



Using a standard MS shell
with the infotech drive mounted ..



src\sparta -in
test\ubiquitin.pdb ...




Original
text -- 
 

Contact:     
shenyang@niddk.nih.gov;
bax@nih.gov 
Web:     
http://spin.niddk.nih.gov/bax

 DOWNLOAD 

[[http://spin.niddk.nih.gov/bax/software/SPARTA/sparta.linux.tar.Z][RedHat
Linux /Fedora Core version ]]
[[http://spin.niddk.nih.gov/bax/software/SPARTA/sparta.win32.zip][Win32
version]]

The download unix archive can be
unpacked with a command like the following: 
   zcat sparta.linux.tar.Z | tar xvf -


The win32 archive can be unpacked with
a traditional Windows zip software. 

Users are encouraged to email the
author to be informed about updates and related software. 


[[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#what%20is%20sparta][ *What
is SPARTA?* ]] 
[[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#reliability][ *Reliability
of SPARTA* ]] 
[[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#components][ *Components
of the SPARTA Package* ]]
[[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#how%20to%20use][ *How
to Use SPARTA* ]] 
[[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#PDB%20coordinates][ *Preparing
the PDB Coordinates* ]]

[[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#add%20new%20proteins][ *Adding
New Proteins to the Database* ]]
[[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#compile][ *Compile
the Source Code* ]]
[[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#about%20name][ *About
the Name SPARTA* ]] 

 What 
is SPARTA?

SPARTA is a database system for
empirical prediction of backbone chemical shifts (N, HN, HA, CA, CB,
CO) using a combination of backbone phi, psi torsion angles and
sidechain chi1 angles from a given protein with known PDB
coordinates. The SPARTA approach is an extension of the well-known
observation that many kinds of secondary chemical shifts (i.e.
differences between chemical shifts and their corresponding random
coil values) are highly correlated with aspects of protein secondary
structure. The goal of SPARTA is to use phi, psi, chi1 torsion angles
and sequence information from proteins structure in order to make
quantitative predictions for the backbone chemical shifts 

SPARTA uses the phi, psi and chi1
angles of a given residue to predict secondary shifts for that
residue. SPARTA also includes the information from the next and
previous residues when making predictions for a given residue. So, in
practice, SPARTA uses data for three consecutive residues
simultaneously (i.e. 9 torsion angles and 3 residue types) to make
predictions for the central residue in a triplet. 

The idea behind SPARTA is that if
one can find some triplet of residues in a protein of known structure
with similar structure and sequence to a triplet in a target protein,
then the backbone secondary chemical shifts for this protein will be
useful predictors for the backbone secondary chemical shifts in the
target. 

The similarity is measured with a
score based on the weighted sum of squares differences between the
torsion angles in the target protein and the database entries, so
that lower scores indicated high similarity. In order to take
advantage of the correlations between residue type and secondary
structure, the score also includes a small, qualitative residue type
term which biases the matching towards roughly similar sequences. 

In practice, SPARTA searches a
database for the 20 best matches to a given triplet in the target
protein. The weighted averages chemical chemical shifts (obtained by
subtracting their corresponding random coil chemical shifts values
and the adjustments values arising from the effects of neighboring
residues) of the central residues of these 20 matches are used as a
prediction for the secondary shift of the central residue. The SPARTA
database was constructed using the most well-defined parts of high
resolution (2.4 Angstroms or better) X-ray crystal structures to
define the phi, psi and chi1 angles, as well as other structural
information, such as hydrogen bonding and ring current shifts, which
would be used to quantitatively correct the raw predicted shifts from
database searching. This database currently includes data from 200
proteins, representing 24,166 triplets. 

 Reliability 
of SPARTA

The reliability of the SPARTA
approach was tested by a cross-validation procedure where each
protein was temporarily removed from the database, and its backbone
chemical shifts (N, HN, HA, CA, CB and C’) were predicted using
the remaining protein data. The RMS deviations between the predicted
and experimental shifts are 2.36, 0.46, 0.25, 0.88, 0.97 and 1.01
ppm, respectively. The same shifts prediction accuracies are also
obtained for the proteins with known structures which are not
contained in the database. 

Importantly, it is also found in the
test that the standard deviation the shifts from the central residues
of the 20 matches are correlated with the shifts prediction errors.
By checking the standard deviations in the prediction summary file
(pred/pred.tab) will provide an idea of the prediction reliability. 

It should be noted that the global
structural information, such as ring current shifts and hydrogen
bonding, was also carefully considered in SPARTA. The secondary
shifts in SPARTA database are actually the corrected shifts using the
ring current shifts. As “compensation”, the SPARTA
predicted shifts for target protein are also corrected by adding the
calculated ring current shifts from target protein. For HA and HN,
the predicted secondary shifts are also corrected by using the
hydrogen bond length and their relationship with the prediction
errors, which were derived from above cross-validation. Therefore,
the accuracy of the coordinates of the target protein is critical to
obtain the reliable hydrogen bond information and ring current
shifts, and the final predicted shifts. The calculated hydrogen bond
and ring current shifts information is stored in the input summary
file (/pred/protein_in.tab). 

It should also be noted that the
protein backbone chemical shifts are extremely sensitive to the local
conformation; therefore, SPARTA results for the residues in the
flexible region or the with very large ring current shifts
contribution may be less reliable, which was also indicated by the
test. 

 Components 
of the SPARTA Package

The SPARTA system is implemented
using C++. The compiled executable files ($SPARTA_DIR/src/SPARTA for
Linux, $SPARTA_DIR/src/SPARTA.exe for Windows) or starting script
("$SPARTA_DIR/sparta" for Linux) can be invoked with
"TALOS-like" command-line argument. A complete list of
options can be invoked and generated with a "-help"

command-line argument or simply typing in the executive files or
starting script without any command-line arguments. 

Running SPARTA requires
definition of the environment variables " SPARTA_DIR ";
this will be established automatically by the starting script
("$SPARTA_DIR/sparta" in Linux): 
setenv SPARTA_DIR /disk1/SPARTA

$SPARTA_DIR/src/SPARTA $argv[1-$#argv]


Note that the default
"$SPARTA_DIR" is the current directory if not specified. 

Other files of the SPARTA package
include: 
 $SPARTA_DIR/tab/sparta.tab
The
compiled database of residue triplets with their corresponding
PHI/PSI/CHI1 angles and secondary shifts. 


 *$SPARTA_DIR/tab/randcoil.tab,
rcadj.tab, rcprev.tab, rcnext.tab*
The
table of random coil shifts, adjustments values from neighboring
residues used in the shifts prediction process. (The same tables as
used in TALOS, http://spin.niddk.nih.gov/NMRPipe/talos/)

 $SPARTA_DIR/tab/homology.tab
The
residue type homology factors used in the prediction process, which
is similar to the table used by TALOS. 

 $SPARTA_DIR.tab/weight.tab
The
weighting factors of PHI, PSI and CHI1 angles, and residue type
homology used in the prediction process. 

 $SPARTA_DIR.tab/fitting.tab
The
fitting parameters between prediction accuracy and precision, which
will be used after the prediction process to calculate the estimated
prediction error. 


 $SPARTA_DIR/shifts/.tab*
The
files in this directory are only used when compiling a new database.
When compiling a new database, only shift tables ending with the
".tab" extension will be used. The files in this directory
are the chemical shift tables for the proteins in the database, which
are in the same format as the TALOS shifts tables and must be exactly
consistent with the corresponding structures in the SPARTA pdb
directory. 

 $SPARTA_DIR/pdb/.pdb*
The
PDB coordinates files in this directory are only used along with the
files in the SPARTA shifts directory when compiling a new database
(e.g. adding new proteins into the database). The sequence and
residue numbering must be exactly consistent with the corresponding
assignments in the SPARTA shifts directory. Furthermore, the names of
these files must be exactly consistent with the corresponding
chemical shift tables in the SPARTA shifts directory. 

 $SPARTA_DIR/test/*
The
contents of this "test" directory include the input files
and results for a sample SPARTA analysis. 



 How to Use 
SPARTA

Use of SPARTA to predict backbone
chemical shifts involves the following steps:  

 Create a directory for the  prediction session; all subsequent commands will be executed from  this directory.
  Prepare the input PDB  coordinate file (for example "protein.pdb"), according to  the format given above.

  Run SPARTA  ("$SPARTA_DIR/src/SPARTA" or "$SPARTA_DIR/sparta"  in Linux, "$SPARTA_DIR/src/SPARTA.exe" in Windows) to  perform the database searches. Most commonly, this will simply  require a command such as:
sparta -in protein.pdb
  SPARTA will first generate an input  "pred/protein_in.tab" file from PDB coordinates, which  contains of the phi, psi, chi1 angles, H-bonding information and  ring current shifts. During the database search, a series of files  "pred/X/res*.tab" (X = N, H, HA, CA, CB and C) will be  created. Each one of these files tallies the 20 best database  matches for a given residue in the target protein. Before exiting, a  file "pred.tab" will also be created in "pred"  directory, which includes a summary of the prediction results. The  database search will typically take about 25 sec for a 100-residue  protein on a Linux PC with a 2.8GHz CPU.

  If experimental chemical shifts  for target protein are available (with a name "ref.tab",  for example, and the same format as typical TALOS shift table file,  http://spin.niddk.nih.gov/NMRPipe/talos/),  SAELDI prediction can be performed by a command such as:
sparta -in protein.pdb -ref ref.tab
  SPARTA would compare the predicted  chemical shifts and experimental shifts before exiting, and a  prediction summary file "pred/pred.tab" will be generated  to store the comparison between the reference and predicted shifts,  as well as the errors. If the average prediction error larger than 3  times of the expected errors (standard deviation of prediction  errors / square root of number of shifts), a warning is printed and  a reference correction will be applied to the experimental chemical  shifts. The corrected reference chemical shifts are stored into a  new file "pred/ref.tab"

 

 Preparing 
the Input PDB Coordinates

The input PDB coordinates should be
prepared carefully, so that it has the proper format, naming
conventions. SPARTA accept the standard PDB coordinates file, but
ONLY the FIRST conformer/chain if more than one exist. For PDB
coordinates without hydrogen atoms, the hydrogen atoms are required
to be added (by using programs DYNAMO, REDUCE, MOLMOL, or any other
similar programs) in order to get the hydrogen bonding information
and ring current shifts. For HA atoms of Gly, please use atom names
of "HA1/HA2" 

Examples of the required PDB
coordinate format can be found in the "$SPARTA_DIR/pdb" and

"$SPARTA_DIR/test" directories. 

 Adding 
New Proteins to the Database

New protein chemical shift and
structure data can be added to the database. Note well that this
should be done with great care and caution, to ensure that only
reliable phi/psi/chi1 data with consistently referenced and correct
chemical shifts are included. It suggests that  

 The chemical shifts assignments  for each candidate protein are better validated by conducting a  SPARTA shift prediction using its PDB coordinates.
sparta -in protein.pdb -ref ref.tab
  Check the prediction summary  table (pred/pred.tab) files, remove the experimental shifts for  which the predicted shifts deviated five standard deviations.  Notably, HAs, for which ring current shifts are > 1.5ppm and the  predicted shifts deviate the three standard deviations, are better  removed.
  Chemical shifts shoule be  referenced correctly. A quick check can be conduct by runing above  SPARTA prediction for this protein and inspecting the average SPARTA  prediction errors, which are listed in the header of prediction  summary table (pred/pred.tab). By default, SPARTA will apply a shift  referencing correction if the average prediction error is larger  than 3 times expected error (i.e., standard deviation of prediction  errors / square root of number of shifts), and store the corrected  shifts in a file "pred/ref.tab"

 

Given this, the procedure for adding
new proteins to the SPARTA database is simple as:  

 Create a chemical shift table  for the new protein according to the TALOS format  (http://spin.niddk.nih.gov/NMRPipe/talos/).  Copy the table to the "$SPARTA_DIR/shifts" directory; it  must have a ".tab" extension in order to be used.

  Place the corresponding PDB  structure file into the "$SPARTA_DIR/pdb" directory; it  must have a ".pdb" extension, and its file name, sequence,  and residue numbering must correspond exactly with the shift table.
  Prepare a table file, for  example with a name of "list.tab", which only contains the  names of proteins to be added into the database. This table must  follow the example below:
VARS   PDB_NAME FORMAT %24s bpti ubiquitin profilin ...
  Note that the "PDB_NAME" in  the table file must consistent with the files names (with ".tab"
  and ".pdb" extension) in the SPARTA pdb and shifts  directories.
  In the "SPARTA"  directory, execute the following command to compile a new database:
sparta -compile -pdbDir ./pdb -pdbList list.tab
  A new database  "$SPARTA_DIR/tab/sparta.tab" will be generated from the  files in SPARTA pdb and shifts directories. Please backup the old  database, which will be overwritten.
 

 Compile the 
Source Code

SPARTA was implemented with standard
C++ using Standard Template Library (STL). To compile the source
codes (in /src directory), your system must have a compatible C++
compiler and STL library. Given this, the compiling of SPARTA
executable file is simple as: 

cd $SPARTA_DIR/src
make


The compiling of the SPARTA program has
been tested on Windows (XP) and Linux (Linux 9 or newer). The
compiled executable files ("$SPARTA_DIR/src/SPARTA" for
Linux, or "$SPARTA_DIR/src/SPARTA.exe" for Windows) are
contained in the distributed SPARTA package. 

 About the 
Name SPARTA



In antiquity Sparta was a Dorian
Greek military state, originally centered in Laconia. As a city-state
devoted to military training, Sparta possessed the most formidable
army in the Greek world and regarded itself as the natural protector
of Greece. 


_[
Home ] [ NIH
] [ NIDDK ]
[ Disclaimer ]
[ Copyright ]_

_last
updated:  Apr 2007 / Webmaster_

Revision 404 Feb 2008 - Main.DavidCowburn

-<
<
+*SPARTA:
->
>
+ SPARTA: Shifts Predicted from Analogy in Residue type and Torsion Angle – NYSBC notes
-<
<
+Shifts Predicted from Analogy in Residue type and Torsion Angle –
NYSBC notes*
 As described in the paper:

Protein backbone chemical
shifts predicted from searching a database for torsion angle and
sequence homology 

Yang Shen and Ad Bax 

LIBRARY:ShenBax08.pdf



Local install –
dl380://infotech/spartainstallPC

cygwin session ...
Script started on Mon Feb 
4 12:21:52 2008


>>Administrator@cowburn-pc
#[33m/cygdrive/d/spartainstall/SPARTA
 ./src/sparta in
test/ubiquitin.pdb 




Reading PDB Coordinates
from test/ubiquitin.pdb
Reading Random Coil Shifts
from .\tab\randcoil.tab

Reading RC Adjustments from
.\tab\rcadj.tab
Reading Previous Residue RC
Adjustments from .\tab\rcprev.tab
Reading Next Residue RC
Adjustments from .\tab\rcnext.tab
Reading Weighting Factors
from .\tab\weight.tab
Reading Residue Homology
Table from .\tab\homology.tab
Reading Fitting Parameter
Table from .\tab\fitting.tab
Reading .\tab\sparta.tab,
24166 Triplets
 Can't save file
pred\test/ubiquitin_in.tab

Analyzing
test/ubiquitin.pdb 76 residues read 

Predicting ...
       N       HA        C 
     CA       CB        H 

 124.353    5.462  175.920 
 55.080   30.759    8.947    2    Q test/ubiquitin
 116.472    4.213  172.450 
 59.570   42.210    8.342    3    I test/ubiquitin
 119.243    5.693  175.320 
 55.210   41.480    8.871    4    F test/ubiquitin

 122.133    4.870  174.870 
 60.621   34.230    9.693    5    V test/ubiquitin
 128.653    5.367  177.140 
 54.519   35.050    9.096    6    K test/ubiquitin
 116.533    4.970  176.909 
 60.470   70.630    8.925    7    T test/ubiquitin
 122.463    4.310  178.800 
 57.580   41.970    9.037    8    L test/ubiquitin
 106.723    4.428  175.520 
 61.400   69.140    7.386    9    T test/ubiquitin
 110.023    3.978  174.070 
 45.460 9999.000    7.522   10    G test/ubiquitin

 122.734    4.361  175.940 
 56.280   33.200    6.915   11    K test/ubiquitin
 121.573    5.264  174.320 
 62.390   69.910    8.627   12    T test/ubiquitin
 128.243    4.545  175.220 
 59.980   40.950    9.852   13    I test/ubiquitin
 122.653    5.067  173.789 
 61.940   69.650    8.696   14    T test/ubiquitin
 125.933    4.779  174.670 
 52.830   47.070    8.760   15    L test/ubiquitin
 123.293    5.045  175.860 
 54.820   29.450    8.177   16    E test/ubiquitin

 118.342    4.713  174.160 
 58.431   36.400    9.226   17    V test/ubiquitin
 120.123    5.078  176.161 
 52.720   30.310    8.723   18    E test/ubiquitin
 139.146    4.141  175.310 
 65.470   31.950 9999.000   19    P test/ubiquitin
 104.533    4.370  174.660 
 57.400   63.370    7.137   20    S test/ubiquitin
 124.613    4.695  176.360 
 55.700   40.800    8.351   21    D test/ubiquitin
 109.934    5.147  176.750 
 59.690   71.200    7.948   22    T test/ubiquitin

 122.323    3.657  179.040 
 62.260   34.350    8.688   23    I test/ubiquitin
 121.963    3.917  178.640 
 60.220   28.280    9.795   24    E test/ubiquitin
 121.703    4.525  178.379 
 56.060   38.449    7.723   25    N test/ubiquitin
 122.843    3.397  177.950 
 67.660   30.840    7.978   26    V test/ubiquitin
 119.993    4.648  180.550 
 59.249   33.730    8.617   27    K test/ubiquitin
 124.573    4.161  180.300 
 55.370   17.710    7.904   28    A test/ubiquitin

 121.073    4.207  180.320 
 59.650   33.290    7.933   29    K test/ubiquitin
 122.213    3.507  178.310 
 66.150   36.800    8.326   30    I test/ubiquitin
 124.623    3.829  178.890 
 60.000   27.720    8.622   31    Q test/ubiquitin
 120.493    4.354  177.250 
 57.190   40.580    8.231   32    D test/ubiquitin
 116.263    4.337  177.870 
 58.050   34.170    7.521   33    K test/ubiquitin
 115.003    4.625  177.840 
 55.170   32.661    8.995   34    E test/ubiquitin

 109.782    4.035  173.960 
 46.080 9999.000    8.741   35    G test/ubiquitin
 121.013    4.446  173.590 
 57.750   40.580    6.297   36    I test/ubiquitin
 142.438    4.634  176.940 
 61.660   31.850 9999.000   37    P test/ubiquitin
 139.608    4.117  178.320 
 66.260   32.890 9999.000   38    P test/ubiquitin
 114.512    4.430  177.090 
 55.640   39.540    8.617   39    D test/ubiquitin
 117.913    4.583  175.381 
 55.640   30.140    7.924   40    Q test/ubiquitin

 118.853    4.244  176.300 
 56.470   31.650    7.307   41    Q test/ubiquitin
 123.813    4.499  174.050 
 55.050   31.750    8.520   42    R test/ubiquitin
 125.173    5.353  175.290 
 52.980   45.790    8.867   43    L test/ubiquitin
 123.053    5.216  176.060 
 58.980   41.420    9.487   44    I test/ubiquitin
 126.523    5.045  174.470 
 57.020   43.760    8.869   45    F test/ubiquitin
 133.333    3.690  177.289 
 52.540   16.570    8.897   46    A test/ubiquitin

 103.473    3.791  173.810 
 45.350 9999.000    8.087   47    G test/ubiquitin
 122.702    4.623  174.700 
 54.550   34.530    8.284   48    K test/ubiquitin
 123.543    4.666  175.670 
 55.740   29.000    8.667   49    Q test/ubiquitin
 126.653    4.090  176.659 
 54.240   41.570    8.872   50    L test/ubiquitin
 124.073    4.488  175.870 
 55.960   31.570    8.442   51    E test/ubiquitin
 121.163    4.360  177.330 
 56.959   40.850    8.187   52    D test/ubiquitin

 107.793    4.045  174.870 
 45.170 9999.000    9.567   53    G test/ubiquitin
 120.183    4.695  175.350 
 54.390   32.650    7.288   54    R test/ubiquitin
 109.533    5.508  176.560 
 59.690   72.260    8.799   55    T test/ubiquitin
 119.053    4.060  180.810 
 58.710   40.370    8.176   56    L test/ubiquitin
 114.463    4.370  178.310 
 61.080   62.530    8.585   57    S test/ubiquitin
 125.323    4.296  177.400 
 57.180   40.100    7.654   58    D test/ubiquitin

 116.642    4.670  174.700 
 58.250   40.070    7.124   59    Y test/ubiquitin
 117.033    4.355  174.341 
 54.120   37.410    8.329   60    N test/ubiquitin
 119.733    3.393  174.610 
 62.420   36.740    6.970   61    I test/ubiquitin
 125.874    4.506  175.970 
 53.660   31.650    7.647   62    Q test/ubiquitin
 121.433    4.001  175.810 
 57.791   32.649    8.487   63    K test/ubiquitin
 115.083    3.465  175.250 
 57.890   25.900    9.591   64    E test/ubiquitin

 115.863    4.640  172.160 
 60.890   64.910    7.383   65    S test/ubiquitin
 118.242    5.614  173.950 
 62.340   70.080    8.737   66    T test/ubiquitin
 128.243    5.060  175.770 
 53.900   44.260    9.801   67    L test/ubiquitin
 119.513    5.292  173.150 
 55.000   30.531    9.633   68    H test/ubiquitin
 125.592    5.282  175.270 
 53.890   44.380    8.533   69    L test/ubiquitin
 128.073    4.351  173.999 
 60.800   34.910    9.490   70    V test/ubiquitin

 124.262    5.361  177.830 
 53.940   42.851    8.067   71    L test/ubiquitin
 124.244    4.921  174.953 
 54.777   32.225    9.169   72    R test/ubiquitin
 128.176    4.628  176.270 
 54.090   42.511    8.881   73    L test/ubiquitin
 124.343    4.706  175.048 
 54.919   31.176    8.588   74    R test/ubiquitin
 112.599    4.156  173.001 
 44.721 9999.000    8.348   75    G test/ubiquitin




 Running time: 20.343
seconds

>>/cygdrive/d/spartainstall/SPARTA#



Using a standard MS shell
with the infotech drive mounted ..



src\sparta -in
test\ubiquitin.pdb ...




Original
text -- 
 

Contact:     
shenyang@niddk.nih.gov;
bax@nih.gov 
Web:     
http://spin.niddk.nih.gov/bax

 DOWNLOAD 

[[http://spin.niddk.nih.gov/bax/software/SPARTA/sparta.linux.tar.Z][RedHat
Linux /Fedora Core version ]]
[[http://spin.niddk.nih.gov/bax/software/SPARTA/sparta.win32.zip][Win32
version]]

The download unix archive can be
unpacked with a command like the following: 
   zcat sparta.linux.tar.Z | tar xvf -


The win32 archive can be unpacked with
a traditional Windows zip software. 

Users are encouraged to email the
author to be informed about updates and related software. 


[[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#what%20is%20sparta][ *What
is SPARTA?* ]] 
[[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#reliability][ *Reliability
of SPARTA* ]] 
[[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#components][ *Components
of the SPARTA Package* ]]
[[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#how%20to%20use][ *How
to Use SPARTA* ]] 
[[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#PDB%20coordinates][ *Preparing
the PDB Coordinates* ]]

[[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#add%20new%20proteins][ *Adding
New Proteins to the Database* ]]
[[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#compile][ *Compile
the Source Code* ]]
[[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#about%20name][ *About
the Name SPARTA* ]] 

 What 
is SPARTA?

SPARTA is a database system for
empirical prediction of backbone chemical shifts (N, HN, HA, CA, CB,
CO) using a combination of backbone phi, psi torsion angles and
sidechain chi1 angles from a given protein with known PDB
coordinates. The SPARTA approach is an extension of the well-known
observation that many kinds of secondary chemical shifts (i.e.
differences between chemical shifts and their corresponding random
coil values) are highly correlated with aspects of protein secondary
structure. The goal of SPARTA is to use phi, psi, chi1 torsion angles
and sequence information from proteins structure in order to make
quantitative predictions for the backbone chemical shifts 

SPARTA uses the phi, psi and chi1
angles of a given residue to predict secondary shifts for that
residue. SPARTA also includes the information from the next and
previous residues when making predictions for a given residue. So, in
practice, SPARTA uses data for three consecutive residues
simultaneously (i.e. 9 torsion angles and 3 residue types) to make
predictions for the central residue in a triplet. 

The idea behind SPARTA is that if
one can find some triplet of residues in a protein of known structure
with similar structure and sequence to a triplet in a target protein,
then the backbone secondary chemical shifts for this protein will be
useful predictors for the backbone secondary chemical shifts in the
target. 

The similarity is measured with a
score based on the weighted sum of squares differences between the
torsion angles in the target protein and the database entries, so
that lower scores indicated high similarity. In order to take
advantage of the correlations between residue type and secondary
structure, the score also includes a small, qualitative residue type
term which biases the matching towards roughly similar sequences. 

In practice, SPARTA searches a
database for the 20 best matches to a given triplet in the target
protein. The weighted averages chemical chemical shifts (obtained by
subtracting their corresponding random coil chemical shifts values
and the adjustments values arising from the effects of neighboring
residues) of the central residues of these 20 matches are used as a
prediction for the secondary shift of the central residue. The SPARTA
database was constructed using the most well-defined parts of high
resolution (2.4 Angstroms or better) X-ray crystal structures to
define the phi, psi and chi1 angles, as well as other structural
information, such as hydrogen bonding and ring current shifts, which
would be used to quantitatively correct the raw predicted shifts from
database searching. This database currently includes data from 200
proteins, representing 24,166 triplets. 

 Reliability 
of SPARTA

The reliability of the SPARTA
approach was tested by a cross-validation procedure where each
protein was temporarily removed from the database, and its backbone
chemical shifts (N, HN, HA, CA, CB and C’) were predicted using
the remaining protein data. The RMS deviations between the predicted
and experimental shifts are 2.36, 0.46, 0.25, 0.88, 0.97 and 1.01
ppm, respectively. The same shifts prediction accuracies are also
obtained for the proteins with known structures which are not
contained in the database. 

Importantly, it is also found in the
test that the standard deviation the shifts from the central residues
of the 20 matches are correlated with the shifts prediction errors.
By checking the standard deviations in the prediction summary file
(pred/pred.tab) will provide an idea of the prediction reliability. 

It should be noted that the global
structural information, such as ring current shifts and hydrogen
bonding, was also carefully considered in SPARTA. The secondary
shifts in SPARTA database are actually the corrected shifts using the
ring current shifts. As “compensation”, the SPARTA
predicted shifts for target protein are also corrected by adding the
calculated ring current shifts from target protein. For HA and HN,
the predicted secondary shifts are also corrected by using the
hydrogen bond length and their relationship with the prediction
errors, which were derived from above cross-validation. Therefore,
the accuracy of the coordinates of the target protein is critical to
obtain the reliable hydrogen bond information and ring current
shifts, and the final predicted shifts. The calculated hydrogen bond
and ring current shifts information is stored in the input summary
file (/pred/protein_in.tab). 

It should also be noted that the
protein backbone chemical shifts are extremely sensitive to the local
conformation; therefore, SPARTA results for the residues in the
flexible region or the with very large ring current shifts
contribution may be less reliable, which was also indicated by the
test. 

 Components 
of the SPARTA Package

The SPARTA system is implemented
using C++. The compiled executable files ($SPARTA_DIR/src/SPARTA for
Linux, $SPARTA_DIR/src/SPARTA.exe for Windows) or starting script
("$SPARTA_DIR/sparta" for Linux) can be invoked with
"TALOS-like" command-line argument. A complete list of
options can be invoked and generated with a "-help"

command-line argument or simply typing in the executive files or
starting script without any command-line arguments. 

Running SPARTA requires
definition of the environment variables " SPARTA_DIR ";
this will be established automatically by the starting script
("$SPARTA_DIR/sparta" in Linux): 
setenv SPARTA_DIR /disk1/SPARTA

$SPARTA_DIR/src/SPARTA $argv[1-$#argv]


Note that the default
"$SPARTA_DIR" is the current directory if not specified. 

Other files of the SPARTA package
include: 
 $SPARTA_DIR/tab/sparta.tab
The
compiled database of residue triplets with their corresponding
PHI/PSI/CHI1 angles and secondary shifts. 


 *$SPARTA_DIR/tab/randcoil.tab,
rcadj.tab, rcprev.tab, rcnext.tab*
The
table of random coil shifts, adjustments values from neighboring
residues used in the shifts prediction process. (The same tables as
used in TALOS, http://spin.niddk.nih.gov/NMRPipe/talos/)

 $SPARTA_DIR/tab/homology.tab
The
residue type homology factors used in the prediction process, which
is similar to the table used by TALOS. 

 $SPARTA_DIR.tab/weight.tab
The
weighting factors of PHI, PSI and CHI1 angles, and residue type
homology used in the prediction process. 

 $SPARTA_DIR.tab/fitting.tab
The
fitting parameters between prediction accuracy and precision, which
will be used after the prediction process to calculate the estimated
prediction error. 


 $SPARTA_DIR/shifts/.tab*
The
files in this directory are only used when compiling a new database.
When compiling a new database, only shift tables ending with the
".tab" extension will be used. The files in this directory
are the chemical shift tables for the proteins in the database, which
are in the same format as the TALOS shifts tables and must be exactly
consistent with the corresponding structures in the SPARTA pdb
directory. 

 $SPARTA_DIR/pdb/.pdb*
The
PDB coordinates files in this directory are only used along with the
files in the SPARTA shifts directory when compiling a new database
(e.g. adding new proteins into the database). The sequence and
residue numbering must be exactly consistent with the corresponding
assignments in the SPARTA shifts directory. Furthermore, the names of
these files must be exactly consistent with the corresponding
chemical shift tables in the SPARTA shifts directory. 

 $SPARTA_DIR/test/*
The
contents of this "test" directory include the input files
and results for a sample SPARTA analysis. 



 How to Use 
SPARTA

Use of SPARTA to predict backbone
chemical shifts involves the following steps:  

 Create a directory for the  prediction session; all subsequent commands will be executed from  this directory.
  Prepare the input PDB  coordinate file (for example "protein.pdb"), according to  the format given above.

  Run SPARTA  ("$SPARTA_DIR/src/SPARTA" or "$SPARTA_DIR/sparta"  in Linux, "$SPARTA_DIR/src/SPARTA.exe" in Windows) to  perform the database searches. Most commonly, this will simply  require a command such as:
sparta -in protein.pdb
  SPARTA will first generate an input  "pred/protein_in.tab" file from PDB coordinates, which  contains of the phi, psi, chi1 angles, H-bonding information and  ring current shifts. During the database search, a series of files  "pred/X/res*.tab" (X = N, H, HA, CA, CB and C) will be  created. Each one of these files tallies the 20 best database  matches for a given residue in the target protein. Before exiting, a  file "pred.tab" will also be created in "pred"  directory, which includes a summary of the prediction results. The  database search will typically take about 25 sec for a 100-residue  protein on a Linux PC with a 2.8GHz CPU.

  If experimental chemical shifts  for target protein are available (with a name "ref.tab",  for example, and the same format as typical TALOS shift table file,  http://spin.niddk.nih.gov/NMRPipe/talos/),  SAELDI prediction can be performed by a command such as:
sparta -in protein.pdb -ref ref.tab
  SPARTA would compare the predicted  chemical shifts and experimental shifts before exiting, and a  prediction summary file "pred/pred.tab" will be generated  to store the comparison between the reference and predicted shifts,  as well as the errors. If the average prediction error larger than 3  times of the expected errors (standard deviation of prediction  errors / square root of number of shifts), a warning is printed and  a reference correction will be applied to the experimental chemical  shifts. The corrected reference chemical shifts are stored into a  new file "pred/ref.tab"

 

 Preparing 
the Input PDB Coordinates

The input PDB coordinates should be
prepared carefully, so that it has the proper format, naming
conventions. SPARTA accept the standard PDB coordinates file, but
ONLY the FIRST conformer/chain if more than one exist. For PDB
coordinates without hydrogen atoms, the hydrogen atoms are required
to be added (by using programs DYNAMO, REDUCE, MOLMOL, or any other
similar programs) in order to get the hydrogen bonding information
and ring current shifts. For HA atoms of Gly, please use atom names
of "HA1/HA2" 

Examples of the required PDB
coordinate format can be found in the "$SPARTA_DIR/pdb" and

"$SPARTA_DIR/test" directories. 

 Adding 
New Proteins to the Database

New protein chemical shift and
structure data can be added to the database. Note well that this
should be done with great care and caution, to ensure that only
reliable phi/psi/chi1 data with consistently referenced and correct
chemical shifts are included. It suggests that  

 The chemical shifts assignments  for each candidate protein are better validated by conducting a  SPARTA shift prediction using its PDB coordinates.
sparta -in protein.pdb -ref ref.tab
  Check the prediction summary  table (pred/pred.tab) files, remove the experimental shifts for  which the predicted shifts deviated five standard deviations.  Notably, HAs, for which ring current shifts are > 1.5ppm and the  predicted shifts deviate the three standard deviations, are better  removed.
  Chemical shifts shoule be  referenced correctly. A quick check can be conduct by runing above  SPARTA prediction for this protein and inspecting the average SPARTA  prediction errors, which are listed in the header of prediction  summary table (pred/pred.tab). By default, SPARTA will apply a shift  referencing correction if the average prediction error is larger  than 3 times expected error (i.e., standard deviation of prediction  errors / square root of number of shifts), and store the corrected  shifts in a file "pred/ref.tab"

 

Given this, the procedure for adding
new proteins to the SPARTA database is simple as:  

 Create a chemical shift table  for the new protein according to the TALOS format  (http://spin.niddk.nih.gov/NMRPipe/talos/).  Copy the table to the "$SPARTA_DIR/shifts" directory; it  must have a ".tab" extension in order to be used.

  Place the corresponding PDB  structure file into the "$SPARTA_DIR/pdb" directory; it  must have a ".pdb" extension, and its file name, sequence,  and residue numbering must correspond exactly with the shift table.
  Prepare a table file, for  example with a name of "list.tab", which only contains the  names of proteins to be added into the database. This table must  follow the example below:
VARS   PDB_NAME FORMAT %24s bpti ubiquitin profilin ...
  Note that the "PDB_NAME" in  the table file must consistent with the files names (with ".tab"
  and ".pdb" extension) in the SPARTA pdb and shifts  directories.
  In the "SPARTA"  directory, execute the following command to compile a new database:
sparta -compile -pdbDir ./pdb -pdbList list.tab
  A new database  "$SPARTA_DIR/tab/sparta.tab" will be generated from the  files in SPARTA pdb and shifts directories. Please backup the old  database, which will be overwritten.
 

 Compile the 
Source Code

SPARTA was implemented with standard
C++ using Standard Template Library (STL). To compile the source
codes (in /src directory), your system must have a compatible C++
compiler and STL library. Given this, the compiling of SPARTA
executable file is simple as: 

cd $SPARTA_DIR/src
make


The compiling of the SPARTA program has
been tested on Windows (XP) and Linux (Linux 9 or newer). The
compiled executable files ("$SPARTA_DIR/src/SPARTA" for
Linux, or "$SPARTA_DIR/src/SPARTA.exe" for Windows) are
contained in the distributed SPARTA package. 

 About the 
Name SPARTA



In antiquity Sparta was a Dorian
Greek military state, originally centered in Laconia. As a city-state
devoted to military training, Sparta possessed the most formidable
army in the Greek world and regarded itself as the natural protector
of Greece. 


_[
Home ] [ NIH
] [ NIDDK ]
[ Disclaimer ]
[ Copyright ]_

_last
updated:  Apr 2007 / Webmaster_

Revision 304 Feb 2008 - Main.DavidCowburn

-<
<
+	SPARTA Protein Backbone Chemical Shifts Prediction Program
->
>
-<
<
-<
<
+SPARTA:
Shifts Predicted from Analogy in Residue type and Torsion Angle –
NYSBC notes
-<
<
+As described in the paper: 
 

Protein backbone chemical
->
>
+*SPARTA:
Shifts Predicted from Analogy in Residue type and Torsion Angle –
NYSBC notes*
->
>
+As described in the paper:

Protein backbone chemical
 shifts predicted from searching a database for torsion angle and
-<
<
+sequence homology
->
>
+sequence homology
-<
<
+Yang Shen and Ad Bax
->
>
+Yang Shen and Ad Bax
-<
<
+LIBRARY:ShenBax08.pdf
->
>
+LIBRARY:ShenBax08.pdf
->
>
-<
<
+Local install –
dl380://infotech/spartainstallPC
->
>
+Local install –
dl380://infotech/spartainstallPC
-<
<
+cygwin session ...
Script started on Mon Feb 
4 12:21:52 2008
->
>
+cygwin session ...
Script started on Mon Feb 
4 12:21:52 2008
->
>
-<
<
+>>Administrator@cowburn-pc
#[33m/cygdrive/d/spartainstall/SPARTA
	./src/sparta in
test/ubiquitin.pdb
->
>
+>>Administrator@cowburn-pc
#[33m/cygdrive/d/spartainstall/SPARTA
 ./src/sparta in
test/ubiquitin.pdb
-<
<
->
>
->
>
-<
<
+Reading PDB Coordinates
from test/ubiquitin.pdb
Reading Random Coil Shifts
from .\tab\randcoil.tab
->
>
+Reading PDB Coordinates
from test/ubiquitin.pdb
Reading Random Coil Shifts
from .\tab\randcoil.tab
-<
<
+Reading RC Adjustments from
.\tab\rcadj.tab
Reading Previous Residue RC
Adjustments from .\tab\rcprev.tab
Reading Next Residue RC
Adjustments from .\tab\rcnext.tab
Reading Weighting Factors
from .\tab\weight.tab
Reading Residue Homology
Table from .\tab\homology.tab
Reading Fitting Parameter
Table from .\tab\fitting.tab
Reading .\tab\sparta.tab,
24166 Triplets
	Can't save file
pred\test/ubiquitin_in.tab
->
>
+Reading RC Adjustments from
.\tab\rcadj.tab
Reading Previous Residue RC
Adjustments from .\tab\rcprev.tab
Reading Next Residue RC
Adjustments from .\tab\rcnext.tab
Reading Weighting Factors
from .\tab\weight.tab
Reading Residue Homology
Table from .\tab\homology.tab
Reading Fitting Parameter
Table from .\tab\fitting.tab
Reading .\tab\sparta.tab,
24166 Triplets
 Can't save file
pred\test/ubiquitin_in.tab
-<
<
+Analyzing
test/ubiquitin.pdb 76 residues read
->
>
+Analyzing
test/ubiquitin.pdb 76 residues read
-<
<
+Predicting ...
       N       HA        C 
     CA       CB        H
->
>
+Predicting ...
       N       HA        C 
     CA       CB        H
-<
<
+.353    5.462  175.920 
 55.080   30.759    8.947    2    Q test/ubiquitin
 116.472    4.213  172.450 
 59.570   42.210    8.342    3    I test/ubiquitin
 119.243    5.693  175.320 
 55.210   41.480    8.871    4    F test/ubiquitin
->
>
+.353    5.462  175.920 
 55.080   30.759    8.947    2    Q test/ubiquitin
 116.472    4.213  172.450 
 59.570   42.210    8.342    3    I test/ubiquitin
 119.243    5.693  175.320 
 55.210   41.480    8.871    4    F test/ubiquitin
-<
<
+.133    4.870  174.870 
 60.621   34.230    9.693    5    V test/ubiquitin
 128.653    5.367  177.140 
 54.519   35.050    9.096    6    K test/ubiquitin
 116.533    4.970  176.909 
 60.470   70.630    8.925    7    T test/ubiquitin
 122.463    4.310  178.800 
 57.580   41.970    9.037    8    L test/ubiquitin
 106.723    4.428  175.520 
 61.400   69.140    7.386    9    T test/ubiquitin
 110.023    3.978  174.070 
 45.460 9999.000    7.522   10    G test/ubiquitin
->
>
+.133    4.870  174.870 
 60.621   34.230    9.693    5    V test/ubiquitin
 128.653    5.367  177.140 
 54.519   35.050    9.096    6    K test/ubiquitin
 116.533    4.970  176.909 
 60.470   70.630    8.925    7    T test/ubiquitin
 122.463    4.310  178.800 
 57.580   41.970    9.037    8    L test/ubiquitin
 106.723    4.428  175.520 
 61.400   69.140    7.386    9    T test/ubiquitin
 110.023    3.978  174.070 
 45.460 9999.000    7.522   10    G test/ubiquitin
-<
<
+.734    4.361  175.940 
 56.280   33.200    6.915   11    K test/ubiquitin
 121.573    5.264  174.320 
 62.390   69.910    8.627   12    T test/ubiquitin
 128.243    4.545  175.220 
 59.980   40.950    9.852   13    I test/ubiquitin
 122.653    5.067  173.789 
 61.940   69.650    8.696   14    T test/ubiquitin
 125.933    4.779  174.670 
 52.830   47.070    8.760   15    L test/ubiquitin
 123.293    5.045  175.860 
 54.820   29.450    8.177   16    E test/ubiquitin
->
>
+.734    4.361  175.940 
 56.280   33.200    6.915   11    K test/ubiquitin
 121.573    5.264  174.320 
 62.390   69.910    8.627   12    T test/ubiquitin
 128.243    4.545  175.220 
 59.980   40.950    9.852   13    I test/ubiquitin
 122.653    5.067  173.789 
 61.940   69.650    8.696   14    T test/ubiquitin
 125.933    4.779  174.670 
 52.830   47.070    8.760   15    L test/ubiquitin
 123.293    5.045  175.860 
 54.820   29.450    8.177   16    E test/ubiquitin
-<
<
+.342    4.713  174.160 
 58.431   36.400    9.226   17    V test/ubiquitin
 120.123    5.078  176.161 
 52.720   30.310    8.723   18    E test/ubiquitin
 139.146    4.141  175.310 
 65.470   31.950 9999.000   19    P test/ubiquitin
 104.533    4.370  174.660 
 57.400   63.370    7.137   20    S test/ubiquitin
 124.613    4.695  176.360 
 55.700   40.800    8.351   21    D test/ubiquitin
 109.934    5.147  176.750 
 59.690   71.200    7.948   22    T test/ubiquitin
->
>
+.342    4.713  174.160 
 58.431   36.400    9.226   17    V test/ubiquitin
 120.123    5.078  176.161 
 52.720   30.310    8.723   18    E test/ubiquitin
 139.146    4.141  175.310 
 65.470   31.950 9999.000   19    P test/ubiquitin
 104.533    4.370  174.660 
 57.400   63.370    7.137   20    S test/ubiquitin
 124.613    4.695  176.360 
 55.700   40.800    8.351   21    D test/ubiquitin
 109.934    5.147  176.750 
 59.690   71.200    7.948   22    T test/ubiquitin
-<
<
+.323    3.657  179.040 
 62.260   34.350    8.688   23    I test/ubiquitin
 121.963    3.917  178.640 
 60.220   28.280    9.795   24    E test/ubiquitin
 121.703    4.525  178.379 
 56.060   38.449    7.723   25    N test/ubiquitin
 122.843    3.397  177.950 
 67.660   30.840    7.978   26    V test/ubiquitin
 119.993    4.648  180.550 
 59.249   33.730    8.617   27    K test/ubiquitin
 124.573    4.161  180.300 
 55.370   17.710    7.904   28    A test/ubiquitin
->
>
+.323    3.657  179.040 
 62.260   34.350    8.688   23    I test/ubiquitin
 121.963    3.917  178.640 
 60.220   28.280    9.795   24    E test/ubiquitin
 121.703    4.525  178.379 
 56.060   38.449    7.723   25    N test/ubiquitin
 122.843    3.397  177.950 
 67.660   30.840    7.978   26    V test/ubiquitin
 119.993    4.648  180.550 
 59.249   33.730    8.617   27    K test/ubiquitin
 124.573    4.161  180.300 
 55.370   17.710    7.904   28    A test/ubiquitin
-<
<
+.073    4.207  180.320 
 59.650   33.290    7.933   29    K test/ubiquitin
 122.213    3.507  178.310 
 66.150   36.800    8.326   30    I test/ubiquitin
 124.623    3.829  178.890 
 60.000   27.720    8.622   31    Q test/ubiquitin
 120.493    4.354  177.250 
 57.190   40.580    8.231   32    D test/ubiquitin
 116.263    4.337  177.870 
 58.050   34.170    7.521   33    K test/ubiquitin
 115.003    4.625  177.840 
 55.170   32.661    8.995   34    E test/ubiquitin
->
>
+.073    4.207  180.320 
 59.650   33.290    7.933   29    K test/ubiquitin
 122.213    3.507  178.310 
 66.150   36.800    8.326   30    I test/ubiquitin
 124.623    3.829  178.890 
 60.000   27.720    8.622   31    Q test/ubiquitin
 120.493    4.354  177.250 
 57.190   40.580    8.231   32    D test/ubiquitin
 116.263    4.337  177.870 
 58.050   34.170    7.521   33    K test/ubiquitin
 115.003    4.625  177.840 
 55.170   32.661    8.995   34    E test/ubiquitin
-<
<
+.782    4.035  173.960 
 46.080 9999.000    8.741   35    G test/ubiquitin
 121.013    4.446  173.590 
 57.750   40.580    6.297   36    I test/ubiquitin
 142.438    4.634  176.940 
 61.660   31.850 9999.000   37    P test/ubiquitin
 139.608    4.117  178.320 
 66.260   32.890 9999.000   38    P test/ubiquitin
 114.512    4.430  177.090 
 55.640   39.540    8.617   39    D test/ubiquitin
 117.913    4.583  175.381 
 55.640   30.140    7.924   40    Q test/ubiquitin
->
>
+.782    4.035  173.960 
 46.080 9999.000    8.741   35    G test/ubiquitin
 121.013    4.446  173.590 
 57.750   40.580    6.297   36    I test/ubiquitin
 142.438    4.634  176.940 
 61.660   31.850 9999.000   37    P test/ubiquitin
 139.608    4.117  178.320 
 66.260   32.890 9999.000   38    P test/ubiquitin
 114.512    4.430  177.090 
 55.640   39.540    8.617   39    D test/ubiquitin
 117.913    4.583  175.381 
 55.640   30.140    7.924   40    Q test/ubiquitin
-<
<
+.853    4.244  176.300 
 56.470   31.650    7.307   41    Q test/ubiquitin
 123.813    4.499  174.050 
 55.050   31.750    8.520   42    R test/ubiquitin
 125.173    5.353  175.290 
 52.980   45.790    8.867   43    L test/ubiquitin
 123.053    5.216  176.060 
 58.980   41.420    9.487   44    I test/ubiquitin
 126.523    5.045  174.470 
 57.020   43.760    8.869   45    F test/ubiquitin
 133.333    3.690  177.289 
 52.540   16.570    8.897   46    A test/ubiquitin
->
>
+.853    4.244  176.300 
 56.470   31.650    7.307   41    Q test/ubiquitin
 123.813    4.499  174.050 
 55.050   31.750    8.520   42    R test/ubiquitin
 125.173    5.353  175.290 
 52.980   45.790    8.867   43    L test/ubiquitin
 123.053    5.216  176.060 
 58.980   41.420    9.487   44    I test/ubiquitin
 126.523    5.045  174.470 
 57.020   43.760    8.869   45    F test/ubiquitin
 133.333    3.690  177.289 
 52.540   16.570    8.897   46    A test/ubiquitin
-<
<
+.473    3.791  173.810 
 45.350 9999.000    8.087   47    G test/ubiquitin
 122.702    4.623  174.700 
 54.550   34.530    8.284   48    K test/ubiquitin
 123.543    4.666  175.670 
 55.740   29.000    8.667   49    Q test/ubiquitin
 126.653    4.090  176.659 
 54.240   41.570    8.872   50    L test/ubiquitin
 124.073    4.488  175.870 
 55.960   31.570    8.442   51    E test/ubiquitin
 121.163    4.360  177.330 
 56.959   40.850    8.187   52    D test/ubiquitin
->
>
+.473    3.791  173.810 
 45.350 9999.000    8.087   47    G test/ubiquitin
 122.702    4.623  174.700 
 54.550   34.530    8.284   48    K test/ubiquitin
 123.543    4.666  175.670 
 55.740   29.000    8.667   49    Q test/ubiquitin
 126.653    4.090  176.659 
 54.240   41.570    8.872   50    L test/ubiquitin
 124.073    4.488  175.870 
 55.960   31.570    8.442   51    E test/ubiquitin
 121.163    4.360  177.330 
 56.959   40.850    8.187   52    D test/ubiquitin
-<
<
+.793    4.045  174.870 
 45.170 9999.000    9.567   53    G test/ubiquitin
 120.183    4.695  175.350 
 54.390   32.650    7.288   54    R test/ubiquitin
 109.533    5.508  176.560 
 59.690   72.260    8.799   55    T test/ubiquitin
 119.053    4.060  180.810 
 58.710   40.370    8.176   56    L test/ubiquitin
 114.463    4.370  178.310 
 61.080   62.530    8.585   57    S test/ubiquitin
 125.323    4.296  177.400 
 57.180   40.100    7.654   58    D test/ubiquitin
->
>
+.793    4.045  174.870 
 45.170 9999.000    9.567   53    G test/ubiquitin
 120.183    4.695  175.350 
 54.390   32.650    7.288   54    R test/ubiquitin
 109.533    5.508  176.560 
 59.690   72.260    8.799   55    T test/ubiquitin
 119.053    4.060  180.810 
 58.710   40.370    8.176   56    L test/ubiquitin
 114.463    4.370  178.310 
 61.080   62.530    8.585   57    S test/ubiquitin
 125.323    4.296  177.400 
 57.180   40.100    7.654   58    D test/ubiquitin
-<
<
+.642    4.670  174.700 
 58.250   40.070    7.124   59    Y test/ubiquitin
 117.033    4.355  174.341 
 54.120   37.410    8.329   60    N test/ubiquitin
 119.733    3.393  174.610 
 62.420   36.740    6.970   61    I test/ubiquitin
 125.874    4.506  175.970 
 53.660   31.650    7.647   62    Q test/ubiquitin
 121.433    4.001  175.810 
 57.791   32.649    8.487   63    K test/ubiquitin
 115.083    3.465  175.250 
 57.890   25.900    9.591   64    E test/ubiquitin
->
>
+.642    4.670  174.700 
 58.250   40.070    7.124   59    Y test/ubiquitin
 117.033    4.355  174.341 
 54.120   37.410    8.329   60    N test/ubiquitin
 119.733    3.393  174.610 
 62.420   36.740    6.970   61    I test/ubiquitin
 125.874    4.506  175.970 
 53.660   31.650    7.647   62    Q test/ubiquitin
 121.433    4.001  175.810 
 57.791   32.649    8.487   63    K test/ubiquitin
 115.083    3.465  175.250 
 57.890   25.900    9.591   64    E test/ubiquitin
-<
<
+.863    4.640  172.160 
 60.890   64.910    7.383   65    S test/ubiquitin
 118.242    5.614  173.950 
 62.340   70.080    8.737   66    T test/ubiquitin
 128.243    5.060  175.770 
 53.900   44.260    9.801   67    L test/ubiquitin
 119.513    5.292  173.150 
 55.000   30.531    9.633   68    H test/ubiquitin
 125.592    5.282  175.270 
 53.890   44.380    8.533   69    L test/ubiquitin
 128.073    4.351  173.999 
 60.800   34.910    9.490   70    V test/ubiquitin
->
>
+.863    4.640  172.160 
 60.890   64.910    7.383   65    S test/ubiquitin
 118.242    5.614  173.950 
 62.340   70.080    8.737   66    T test/ubiquitin
 128.243    5.060  175.770 
 53.900   44.260    9.801   67    L test/ubiquitin
 119.513    5.292  173.150 
 55.000   30.531    9.633   68    H test/ubiquitin
 125.592    5.282  175.270 
 53.890   44.380    8.533   69    L test/ubiquitin
 128.073    4.351  173.999 
 60.800   34.910    9.490   70    V test/ubiquitin
-<
<
+.262    5.361  177.830 
 53.940   42.851    8.067   71    L test/ubiquitin
 124.244    4.921  174.953 
 54.777   32.225    9.169   72    R test/ubiquitin
 128.176    4.628  176.270 
 54.090   42.511    8.881   73    L test/ubiquitin
 124.343    4.706  175.048 
 54.919   31.176    8.588   74    R test/ubiquitin
 112.599    4.156  173.001 
 44.721 9999.000    8.348   75    G test/ubiquitin
->
>
+.262    5.361  177.830 
 53.940   42.851    8.067   71    L test/ubiquitin
 124.244    4.921  174.953 
 54.777   32.225    9.169   72    R test/ubiquitin
 128.176    4.628  176.270 
 54.090   42.511    8.881   73    L test/ubiquitin
 124.343    4.706  175.048 
 54.919   31.176    8.588   74    R test/ubiquitin
 112.599    4.156  173.001 
 44.721 9999.000    8.348   75    G test/ubiquitin
->
>
-<
<
+	Running time: 20.343
seconds

>>/cygdrive/d/spartainstall/SPARTA#
->
>
+ Running time: 20.343
seconds

>>/cygdrive/d/spartainstall/SPARTA#
->
>
-<
<
+Using a standard MS shell
with the infotech drive mounted ..
->
>
+Using a standard MS shell
with the infotech drive mounted ..
->
>
-<
<
+src\sparta -in
test\ubiquitin.pdb ...
->
>
+src\sparta -in
test\ubiquitin.pdb ...
-<
<
+Original
text --
->
>
+Original
text --
-<
<
+Contact:      
shenyang@niddk.nih.gov;
bax@nih.gov 
Web:      
http://spin.niddk.nih.gov/bax



 DOWNLOAD
-<
<
+RedHat
Linux /Fedora Core version 
Win32
version
The download unix archive can be
unpacked with a command like the following: 

   zcat sparta.linux.tar.Z | tar xvf -
->
>
+Contact:     
shenyang@niddk.nih.gov;
bax@nih.gov 
Web:     
http://spin.niddk.nih.gov/bax

 DOWNLOAD
-<
<
+The win32 archive can be unpacked with
a traditional Windows zip software. 

Users are encouraged to email the
author to be informed about updates and related software. 


What
is SPARTA? 
Reliability
of SPARTA 
Components
of the SPARTA Package 
How
to Use SPARTA 
Preparing
the PDB Coordinates
-<
<
+Adding
New Proteins to the Database 
Compile
the Source Code 
About
the Name SPARTA 


What
is SPARTA?
SPARTA is a database system for
->
>
+[[http://spin.niddk.nih.gov/bax/software/SPARTA/sparta.linux.tar.Z][RedHat
Linux /Fedora Core version ]]
[[http://spin.niddk.nih.gov/bax/software/SPARTA/sparta.win32.zip][Win32
version]]

The download unix archive can be
unpacked with a command like the following: 
   zcat sparta.linux.tar.Z | tar xvf -
->
>
+The win32 archive can be unpacked with
a traditional Windows zip software. 

Users are encouraged to email the
author to be informed about updates and related software. 


[[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#what%20is%20sparta][ *What
is SPARTA?* ]] 
[[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#reliability][ *Reliability
of SPARTA* ]] 
[[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#components][ *Components
of the SPARTA Package* ]]
[[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#how%20to%20use][ *How
to Use SPARTA* ]] 
[[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#PDB%20coordinates][ *Preparing
the PDB Coordinates* ]]

[[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#add%20new%20proteins][ *Adding
New Proteins to the Database* ]]
[[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#compile][ *Compile
the Source Code* ]]
[[http://spin.niddk.nih.gov/bax/software/SPARTA/index.html#about%20name][ *About
the Name SPARTA* ]] 

 What 
is SPARTA?

SPARTA is a database system for
 empirical prediction of backbone chemical shifts (N, HN, HA, CA, CB,
CO) using a combination of backbone phi, psi torsion angles and
sidechain chi1 angles from a given protein with known PDB
coordinates. The SPARTA approach is an extension of the well-known
observation that many kinds of secondary chemical shifts (i.e.
differences between chemical shifts and their corresponding random
coil values) are highly correlated with aspects of protein secondary
structure. The goal of SPARTA is to use phi, psi, chi1 torsion angles
and sequence information from proteins structure in order to make
-<
<
+quantitative predictions for the backbone chemical shifts 

SPARTA uses the phi, psi and chi1
->
>
+quantitative predictions for the backbone chemical shifts 

SPARTA uses the phi, psi and chi1
 angles of a given residue to predict secondary shifts for that
residue. SPARTA also includes the information from the next and
previous residues when making predictions for a given residue. So, in
practice, SPARTA uses data for three consecutive residues
simultaneously (i.e. 9 torsion angles and 3 residue types) to make
-<
<
+predictions for the central residue in a triplet. 

The idea behind SPARTA is that if
->
>
+predictions for the central residue in a triplet. 

The idea behind SPARTA is that if
 one can find some triplet of residues in a protein of known structure
with similar structure and sequence to a triplet in a target protein,
then the backbone secondary chemical shifts for this protein will be
useful predictors for the backbone secondary chemical shifts in the
-<
<
+target.
->
>
+target.
-<
<
->
>
+The similarity is measured with a
-<
<
+The similarity is measured with a
 score based on the weighted sum of squares differences between the
torsion angles in the target protein and the database entries, so
that lower scores indicated high similarity. In order to take
advantage of the correlations between residue type and secondary
structure, the score also includes a small, qualitative residue type
-<
<
+term which biases the matching towards roughly similar sequences. 

In practice, SPARTA searches a
->
>
+term which biases the matching towards roughly similar sequences. 

In practice, SPARTA searches a
 database for the 20 best matches to a given triplet in the target
protein. The weighted averages chemical chemical shifts (obtained by
subtracting their corresponding random coil chemical shifts values
and the adjustments values arising from the effects of neighboring
residues) of the central residues of these 20 matches are used as a
prediction for the secondary shift of the central residue. The SPARTA
database was constructed using the most well-defined parts of high
resolution (2.4 Angstroms or better) X-ray crystal structures to
define the phi, psi and chi1 angles, as well as other structural
information, such as hydrogen bonding and ring current shifts, which
would be used to quantitatively correct the raw predicted shifts from
database searching. This database currently includes data from 200
-<
<
+proteins, representing 24,166 triplets. 


Reliability
of SPARTA
The reliability of the SPARTA
->
>
+proteins, representing 24,166 triplets. 

 Reliability 
of SPARTA

The reliability of the SPARTA
 approach was tested by a cross-validation procedure where each
protein was temporarily removed from the database, and its backbone
-<
<
+chemical shifts (N, HN, HA, CA, CB and C’) were predicted using
->
>
+chemical shifts (N, HN, HA, CA, CB and C’) were predicted using
 the remaining protein data. The RMS deviations between the predicted
and experimental shifts are 2.36, 0.46, 0.25, 0.88, 0.97 and 1.01
ppm, respectively. The same shifts prediction accuracies are also
obtained for the proteins with known structures which are not
-<
<
+contained in the database. 

Importantly, it is also found in the
->
>
+contained in the database. 

Importantly, it is also found in the
 test that the standard deviation the shifts from the central residues
of the 20 matches are correlated with the shifts prediction errors.
By checking the standard deviations in the prediction summary file
-<
<
+(pred/pred.tab) will provide an idea of the prediction reliability.
->
>
+(pred/pred.tab) will provide an idea of the prediction reliability.
-<
<
-<
<
+It should be noted that the global
->
>
+It should be noted that the global
 structural information, such as ring current shifts and hydrogen
bonding, was also carefully considered in SPARTA. The secondary
shifts in SPARTA database are actually the corrected shifts using the
-<
<
+ring current shifts. As “compensation”, the SPARTA
->
>
+ring current shifts. As “compensation”, the SPARTA
 predicted shifts for target protein are also corrected by adding the
calculated ring current shifts from target protein. For HA and HN,
the predicted secondary shifts are also corrected by using the
hydrogen bond length and their relationship with the prediction
errors, which were derived from above cross-validation. Therefore,
the accuracy of the coordinates of the target protein is critical to
obtain the reliable hydrogen bond information and ring current
shifts, and the final predicted shifts. The calculated hydrogen bond
and ring current shifts information is stored in the input summary
-<
<
+file (/pred/protein_in.tab). 

It should also be noted that the
->
>
+file (/pred/protein_in.tab). 

It should also be noted that the
 protein backbone chemical shifts are extremely sensitive to the local
conformation; therefore, SPARTA results for the residues in the
flexible region or the with very large ring current shifts
contribution may be less reliable, which was also indicated by the
-<
<
+test. 


Components
of the SPARTA Package
The SPARTA system is implemented
->
>
+test. 

 Components 
of the SPARTA Package

The SPARTA system is implemented
 using C++. The compiled executable files ($SPARTA_DIR/src/SPARTA for
Linux, $SPARTA_DIR/src/SPARTA.exe for Windows) or starting script
-<
<
+("$SPARTA_DIR/sparta" for Linux) can be invoked with
"TALOS-like" command-line argument. A complete list of
options can be invoked and generated with a "-help"
->
>
+("$SPARTA_DIR/sparta" for Linux) can be invoked with
"TALOS-like" command-line argument. A complete list of
options can be invoked and generated with a "-help"
 command-line argument or simply typing in the executive files or
-<
<
+starting script without any command-line arguments. 

Running SPARTA requires
definition of the environment variables "SPARTA_DIR";
->
>
+starting script without any command-line arguments. 

Running SPARTA requires
definition of the environment variables " SPARTA_DIR ";
 this will be established automatically by the starting script
-<
<
+("$SPARTA_DIR/sparta" in Linux): 

setenv SPARTA_DIR /disk1/SPARTA
->
>
+("$SPARTA_DIR/sparta" in Linux): 
setenv SPARTA_DIR /disk1/SPARTA
->
>
+$SPARTA_DIR/src/SPARTA $argv[1-$#argv]
-<
<
+$SPARTA_DIR/src/SPARTA $argv[1-$#argv]
Note that the default
"$SPARTA_DIR" is the current directory if not specified. 

Other files of the SPARTA package
include:
->
>
+Note that the default
"$SPARTA_DIR" is the current directory if not specified. 

Other files of the SPARTA package
include: 
 $SPARTA_DIR/tab/sparta.tab
The
-<
<
+$SPARTA_DIR/tab/sparta.tab
The
 compiled database of residue triplets with their corresponding
-<
<
+PHI/PSI/CHI1 angles and secondary shifts.
->
>
+PHI/PSI/CHI1 angles and secondary shifts.
-<
<
+$SPARTA_DIR/tab/randcoil.tab,
rcadj.tab, rcprev.tab, rcnext.tab
The
->
>
+ *$SPARTA_DIR/tab/randcoil.tab,
rcadj.tab, rcprev.tab, rcnext.tab*
->
>
+The
 table of random coil shifts, adjustments values from neighboring
residues used in the shifts prediction process. (The same tables as
-<
<
+used in TALOS, http://spin.niddk.nih.gov/NMRPipe/talos/)
->
>
+used in TALOS, http://spin.niddk.nih.gov/NMRPipe/talos/)
-<
<
-<
<
+$SPARTA_DIR/tab/homology.tab
The
->
>
+ $SPARTA_DIR/tab/homology.tab
->
>
+The
 residue type homology factors used in the prediction process, which
-<
<
+is similar to the table used by TALOS.
->
>
+is similar to the table used by TALOS.
-<
<
+$SPARTA_DIR.tab/weight.tab
The
->
>
+ $SPARTA_DIR.tab/weight.tab
->
>
+The
 weighting factors of PHI, PSI and CHI1 angles, and residue type
-<
<
+homology used in the prediction process.
->
>
+homology used in the prediction process.
-<
<
+$SPARTA_DIR.tab/fitting.tab
The
->
>
+ $SPARTA_DIR.tab/fitting.tab
->
>
+The
 fitting parameters between prediction accuracy and precision, which
will be used after the prediction process to calculate the estimated
-<
<
+prediction error.
->
>
+prediction error.
-<
<
+$SPARTA_DIR/shifts/*.tab
The
->
>
+ $SPARTA_DIR/shifts/.tab*
->
>
+The
 files in this directory are only used when compiling a new database.
When compiling a new database, only shift tables ending with the
-<
<
+".tab" extension will be used. The files in this directory
->
>
+".tab" extension will be used. The files in this directory
 are the chemical shift tables for the proteins in the database, which
are in the same format as the TALOS shifts tables and must be exactly
consistent with the corresponding structures in the SPARTA pdb
-<
<
+directory.
->
>
+directory.
-<
<
+$SPARTA_DIR/pdb/*.pdb
The
->
>
+ $SPARTA_DIR/pdb/.pdb*
->
>
+The
 PDB coordinates files in this directory are only used along with the
files in the SPARTA shifts directory when compiling a new database
(e.g. adding new proteins into the database). The sequence and
residue numbering must be exactly consistent with the corresponding
assignments in the SPARTA shifts directory. Furthermore, the names of
these files must be exactly consistent with the corresponding
-<
<
+chemical shift tables in the SPARTA shifts directory.
->
>
+chemical shift tables in the SPARTA shifts directory.
-<
<
+$SPARTA_DIR/test/*
The
contents of this "test" directory include the input files
and results for a sample SPARTA analysis.
->
>
+ $SPARTA_DIR/test/*
The
contents of this "test" directory include the input files
->
>
+and results for a sample SPARTA analysis.
-<
<
+How to Use
SPARTA
->
>
+ How to Use 
SPARTA
-<
<
+Use of SPARTA to predict backbone
chemical shifts involves the following steps: 


	Create a directory for the
	prediction session; all subsequent commands will be executed from
	this directory. 
	
	
Prepare the input PDB
	coordinate file (for example "protein.pdb"), according to
	the format given above.
-<
<
+	Run SPARTA
	("$SPARTA_DIR/src/SPARTA" or "$SPARTA_DIR/sparta"
	in Linux, "$SPARTA_DIR/src/SPARTA.exe" in Windows) to
	perform the database searches. Most commonly, this will simply
	require a command such as:
->
>
+Use of SPARTA to predict backbone
chemical shifts involves the following steps:  
 Create a directory for the  prediction session; all subsequent commands will be executed from  this directory.
  Prepare the input PDB  coordinate file (for example "protein.pdb"), according to  the format given above.

  Run SPARTA  ("$SPARTA_DIR/src/SPARTA" or "$SPARTA_DIR/sparta"  in Linux, "$SPARTA_DIR/src/SPARTA.exe" in Windows) to  perform the database searches. Most commonly, this will simply  require a command such as:
sparta -in protein.pdb
  SPARTA will first generate an input  "pred/protein_in.tab" file from PDB coordinates, which  contains of the phi, psi, chi1 angles, H-bonding information and  ring current shifts. During the database search, a series of files  "pred/X/res*.tab" (X = N, H, HA, CA, CB and C) will be  created. Each one of these files tallies the 20 best database  matches for a given residue in the target protein. Before exiting, a  file "pred.tab" will also be created in "pred"  directory, which includes a summary of the prediction results. The  database search will typically take about 25 sec for a 100-residue  protein on a Linux PC with a 2.8GHz CPU.

  If experimental chemical shifts  for target protein are available (with a name "ref.tab",  for example, and the same format as typical TALOS shift table file,  http://spin.niddk.nih.gov/NMRPipe/talos/),  SAELDI prediction can be performed by a command such as:
sparta -in protein.pdb -ref ref.tab
  SPARTA would compare the predicted  chemical shifts and experimental shifts before exiting, and a  prediction summary file "pred/pred.tab" will be generated  to store the comparison between the reference and predicted shifts,  as well as the errors. If the average prediction error larger than 3  times of the expected errors (standard deviation of prediction  errors / square root of number of shifts), a warning is printed and  a reference correction will be applied to the experimental chemical  shifts. The corrected reference chemical shifts are stored into a  new file "pred/ref.tab"
->
>
+ Preparing 
the Input PDB Coordinates
-<
<
+	sparta -in protein.pdb
->
>
+The input PDB coordinates should be
-<
<
+	SPARTA will first generate an input
	"pred/protein_in.tab" file from PDB coordinates, which
	contains of the phi, psi, chi1 angles, H-bonding information and
	ring current shifts. During the database search, a series of files
	"pred/X/res*.tab" (X = N, H, HA, CA, CB and C) will be
	created. Each one of these files tallies the 20 best database
	matches for a given residue in the target protein. Before exiting, a
	file "pred.tab" will also be created in "pred"
	directory, which includes a summary of the prediction results. The
	database search will typically take about 25 sec for a 100-residue
	protein on a Linux PC with a 2.8GHz CPU. 

	
	If experimental chemical shifts
	for target protein are available (with a name "ref.tab",
	for example, and the same format as typical TALOS shift table file,
	http://spin.niddk.nih.gov/NMRPipe/talos/),
	SAELDI prediction can be performed by a command such as: 
	
	sparta -in protein.pdb -ref ref.tab

	SPARTA would compare the predicted
	chemical shifts and experimental shifts before exiting, and a
	prediction summary file "pred/pred.tab" will be generated
	to store the comparison between the reference and predicted shifts,
	as well as the errors. If the average prediction error larger than 3
	times of the expected errors (standard deviation of prediction
	errors / square root of number of shifts), a warning is printed and
	a reference correction will be applied to the experimental chemical
	shifts. The corrected reference chemical shifts are stored into a
	new file "pred/ref.tab" 

	


Preparing
the Input PDB Coordinates
The input PDB coordinates should be
 prepared carefully, so that it has the proper format, naming
conventions. SPARTA accept the standard PDB coordinates file, but
-<
<
+ONLY the FIRST conformer/chain if more than one exist. For PDB
->
>
+ONLY the FIRST conformer/chain if more than one exist. For PDB
 coordinates without hydrogen atoms, the hydrogen atoms are required
to be added (by using programs DYNAMO, REDUCE, MOLMOL, or any other
similar programs) in order to get the hydrogen bonding information
and ring current shifts. For HA atoms of Gly, please use atom names
-<
<
+of "HA1/HA2"
->
>
+of "HA1/HA2"
-<
<
+Examples of the required PDB
coordinate format can be found in the "$SPARTA_DIR/pdb" and
-<
<
+"$SPARTA_DIR/test" directories. 


Adding
New Proteins to the Database
New protein chemical shift and
->
>
+Examples of the required PDB
coordinate format can be found in the "$SPARTA_DIR/pdb" and

"$SPARTA_DIR/test" directories. 

 Adding
->
>
+New Proteins to the Database

New protein chemical shift and
 structure data can be added to the database. Note well that this
should be done with great care and caution, to ensure that only
reliable phi/psi/chi1 data with consistently referenced and correct
-<
<
+chemical shifts are included. It suggests that 


	The chemical shifts assignments
->
>
+chemical shifts are included. It suggests that  
 The chemical shifts assignments  for each candidate protein are better validated by conducting a  SPARTA shift prediction using its PDB coordinates.
sparta -in protein.pdb -ref ref.tab
  Check the prediction summary  table (pred/pred.tab) files, remove the experimental shifts for  which the predicted shifts deviated five standard deviations.  Notably, HAs, for which ring current shifts are > 1.5ppm and the  predicted shifts deviate the three standard deviations, are better  removed.
  Chemical shifts shoule be  referenced correctly. A quick check can be conduct by runing above  SPARTA prediction for this protein and inspecting the average SPARTA  prediction errors, which are listed in the header of prediction  summary table (pred/pred.tab). By default, SPARTA will apply a shift  referencing correction if the average prediction error is larger  than 3 times expected error (i.e., standard deviation of prediction  errors / square root of number of shifts), and store the corrected  shifts in a file "pred/ref.tab"
-<
<
+	for each candidate protein are better validated by conducting a
	SPARTA shift prediction using its PDB coordinates.
-<
<
+	sparta -in protein.pdb -ref ref.tab
	Check the prediction summary
	table (pred/pred.tab) files, remove the experimental shifts for
	which the predicted shifts deviated five standard deviations.
	Notably, HAs, for which ring current shifts are > 1.5ppm and the
	predicted shifts deviate the three standard deviations, are better
	removed. 
	
	
Chemical shifts shoule be
	referenced correctly. A quick check can be conduct by runing above
->
>
+Given this, the procedure for adding
new proteins to the SPARTA database is simple as:  
 Create a chemical shift table  for the new protein according to the TALOS format  (http://spin.niddk.nih.gov/NMRPipe/talos/).  Copy the table to the "$SPARTA_DIR/shifts" directory; it  must have a ".tab" extension in order to be used.

  Place the corresponding PDB  structure file into the "$SPARTA_DIR/pdb" directory; it  must have a ".pdb" extension, and its file name, sequence,  and residue numbering must correspond exactly with the shift table.
  Prepare a table file, for  example with a name of "list.tab", which only contains the  names of proteins to be added into the database. This table must  follow the example below:
VARS   PDB_NAME FORMAT %24s bpti ubiquitin profilin ...
  Note that the "PDB_NAME" in  the table file must consistent with the files names (with ".tab"
  and ".pdb" extension) in the SPARTA pdb and shifts  directories.
  In the "SPARTA"  directory, execute the following command to compile a new database:
sparta -compile -pdbDir ./pdb -pdbList list.tab
  A new database  "$SPARTA_DIR/tab/sparta.tab" will be generated from the  files in SPARTA pdb and shifts directories. Please backup the old  database, which will be overwritten.
 

 Compile the 
Source Code
-<
<
+	SPARTA prediction for this protein and inspecting the average SPARTA
	prediction errors, which are listed in the header of prediction
	summary table (pred/pred.tab). By default, SPARTA will apply a shift
	referencing correction if the average prediction error is larger
	than 3 times expected error (i.e., standard deviation of prediction
	errors / square root of number of shifts), and store the corrected
	shifts in a file "pred/ref.tab"
-<
<
->
>
+SPARTA was implemented with standard
-<
<
+Given this, the procedure for adding
new proteins to the SPARTA database is simple as: 


	Create a chemical shift table
	for the new protein according to the TALOS format
	(http://spin.niddk.nih.gov/NMRPipe/talos/).
	Copy the table to the "$SPARTA_DIR/shifts" directory; it
	must have a ".tab" extension in order to be used. 
	

	
Place the corresponding PDB
	structure file into the "$SPARTA_DIR/pdb" directory; it
	must have a ".pdb" extension, and its file name, sequence,
	and residue numbering must correspond exactly with the shift table. 
	
	
Prepare a table file, for
	example with a name of "list.tab", which only contains the
	names of proteins to be added into the database. This table must
	follow the example below: 
	

	
VARS   PDB_NAME
FORMAT %24s
bpti
ubiquitin
profilin
...

	Note that the "PDB_NAME" in
	the table file must consistent with the files names (with ".tab"

	and ".pdb" extension) in the SPARTA pdb and shifts
	directories. 
	
	
In the "SPARTA"
	directory, execute the following command to compile a new database: 
	

	
sparta -compile -pdbDir ./pdb -pdbList list.tab
	
A new database
	"$SPARTA_DIR/tab/sparta.tab" will be generated from the
	files in SPARTA pdb and shifts directories. Please backup the old
	database, which will be overwritten. 
	


Compile the
Source Code
SPARTA was implemented with standard
 C++ using Standard Template Library (STL). To compile the source
codes (in /src directory), your system must have a compatible C++
compiler and STL library. Given this, the compiling of SPARTA
-<
<
+executable file is simple as:
->
>
+executable file is simple as:
-<
<
+cd $SPARTA_DIR/src
make

The compiling of the SPARTA program has
->
>
+cd $SPARTA_DIR/src
make
->
>
+The compiling of the SPARTA program has
 been tested on Windows (XP) and Linux (Linux 9 or newer). The
-<
<
+compiled executable files ("$SPARTA_DIR/src/SPARTA" for
Linux, or "$SPARTA_DIR/src/SPARTA.exe" for Windows) are
contained in the distributed SPARTA package. 


About the
->
>
+compiled executable files ("$SPARTA_DIR/src/SPARTA" for
Linux, or "$SPARTA_DIR/src/SPARTA.exe" for Windows) are
contained in the distributed SPARTA package. 

 About the 
Name SPARTA
-<
<
+Name SPARTA
-<
<
+In antiquity Sparta was a Dorian
->
>
+In antiquity Sparta was a Dorian
 Greek military state, originally centered in Laconia. As a city-state
devoted to military training, Sparta possessed the most formidable
army in the Greek world and regarded itself as the natural protector
-<
<
+of Greece.
->
>
+of Greece.
-<
<
+[
Home ] [ NIH
] [ NIDDK ]
[ Disclaimer ]
[ Copyright ]
-<
<
+last
updated:  Apr 2007 / Webmaster
->
>
+_[
Home ] [ NIH
] [ NIDDK ]
[ Disclaimer ]
[ Copyright ]_
->
>
+_last
updated:  Apr 2007 / Webmaster_

Revision 204 Feb 2008 - Main.DavidCowburn

Added:

>
>

SPARTA Protein Backbone Chemical Shifts Prediction Program

SPARTA: Shifts Predicted from Analogy in Residue type and Torsion Angle – NYSBC notes

As described in the paper:

Protein backbone chemical shifts predicted from searching a database for torsion angle and sequence homology

Yang Shen and Ad Bax

LIBRARY:ShenBax08.pdf

Local install – dl380://infotech/spartainstallPC

cygwin session ...

Script started on Mon Feb 4 12:21:52 2008

>>Administrator@cowburn-pc #[33m/cygdrive/d/spartainstall/SPARTA

./src/sparta in test/ubiquitin.pdb

Reading PDB Coordinates from test/ubiquitin.pdb

Reading Random Coil Shifts from .\tab\randcoil.tab

Reading RC Adjustments from .\tab\rcadj.tab

Reading Previous Residue RC Adjustments from .\tab\rcprev.tab

Reading Next Residue RC Adjustments from .\tab\rcnext.tab

Reading Weighting Factors from .\tab\weight.tab

Reading Residue Homology Table from .\tab\homology.tab

Reading Fitting Parameter Table from .\tab\fitting.tab

Reading .\tab\sparta.tab, 24166 Triplets

Can't save file pred\test/ubiquitin_in.tab

Analyzing test/ubiquitin.pdb 76 residues read

Predicting ...

N HA C CA CB H

124.353 5.462 175.920 55.080 30.759 8.947 2 Q test/ubiquitin

116.472 4.213 172.450 59.570 42.210 8.342 3 I test/ubiquitin

119.243 5.693 175.320 55.210 41.480 8.871 4 F test/ubiquitin

122.133 4.870 174.870 60.621 34.230 9.693 5 V test/ubiquitin

128.653 5.367 177.140 54.519 35.050 9.096 6 K test/ubiquitin

116.533 4.970 176.909 60.470 70.630 8.925 7 T test/ubiquitin

122.463 4.310 178.800 57.580 41.970 9.037 8 L test/ubiquitin

106.723 4.428 175.520 61.400 69.140 7.386 9 T test/ubiquitin

110.023 3.978 174.070 45.460 9999.000 7.522 10 G test/ubiquitin

122.734 4.361 175.940 56.280 33.200 6.915 11 K test/ubiquitin

121.573 5.264 174.320 62.390 69.910 8.627 12 T test/ubiquitin

128.243 4.545 175.220 59.980 40.950 9.852 13 I test/ubiquitin

122.653 5.067 173.789 61.940 69.650 8.696 14 T test/ubiquitin

125.933 4.779 174.670 52.830 47.070 8.760 15 L test/ubiquitin

123.293 5.045 175.860 54.820 29.450 8.177 16 E test/ubiquitin

118.342 4.713 174.160 58.431 36.400 9.226 17 V test/ubiquitin

120.123 5.078 176.161 52.720 30.310 8.723 18 E test/ubiquitin

139.146 4.141 175.310 65.470 31.950 9999.000 19 P test/ubiquitin

104.533 4.370 174.660 57.400 63.370 7.137 20 S test/ubiquitin

124.613 4.695 176.360 55.700 40.800 8.351 21 D test/ubiquitin

109.934 5.147 176.750 59.690 71.200 7.948 22 T test/ubiquitin

122.323 3.657 179.040 62.260 34.350 8.688 23 I test/ubiquitin

121.963 3.917 178.640 60.220 28.280 9.795 24 E test/ubiquitin

121.703 4.525 178.379 56.060 38.449 7.723 25 N test/ubiquitin

122.843 3.397 177.950 67.660 30.840 7.978 26 V test/ubiquitin

119.993 4.648 180.550 59.249 33.730 8.617 27 K test/ubiquitin

124.573 4.161 180.300 55.370 17.710 7.904 28 A test/ubiquitin

121.073 4.207 180.320 59.650 33.290 7.933 29 K test/ubiquitin

122.213 3.507 178.310 66.150 36.800 8.326 30 I test/ubiquitin

124.623 3.829 178.890 60.000 27.720 8.622 31 Q test/ubiquitin

120.493 4.354 177.250 57.190 40.580 8.231 32 D test/ubiquitin

116.263 4.337 177.870 58.050 34.170 7.521 33 K test/ubiquitin

115.003 4.625 177.840 55.170 32.661 8.995 34 E test/ubiquitin

109.782 4.035 173.960 46.080 9999.000 8.741 35 G test/ubiquitin

121.013 4.446 173.590 57.750 40.580 6.297 36 I test/ubiquitin

142.438 4.634 176.940 61.660 31.850 9999.000 37 P test/ubiquitin

139.608 4.117 178.320 66.260 32.890 9999.000 38 P test/ubiquitin

114.512 4.430 177.090 55.640 39.540 8.617 39 D test/ubiquitin

117.913 4.583 175.381 55.640 30.140 7.924 40 Q test/ubiquitin

118.853 4.244 176.300 56.470 31.650 7.307 41 Q test/ubiquitin

123.813 4.499 174.050 55.050 31.750 8.520 42 R test/ubiquitin

125.173 5.353 175.290 52.980 45.790 8.867 43 L test/ubiquitin

123.053 5.216 176.060 58.980 41.420 9.487 44 I test/ubiquitin

126.523 5.045 174.470 57.020 43.760 8.869 45 F test/ubiquitin

133.333 3.690 177.289 52.540 16.570 8.897 46 A test/ubiquitin

103.473 3.791 173.810 45.350 9999.000 8.087 47 G test/ubiquitin

122.702 4.623 174.700 54.550 34.530 8.284 48 K test/ubiquitin

123.543 4.666 175.670 55.740 29.000 8.667 49 Q test/ubiquitin

126.653 4.090 176.659 54.240 41.570 8.872 50 L test/ubiquitin

124.073 4.488 175.870 55.960 31.570 8.442 51 E test/ubiquitin

121.163 4.360 177.330 56.959 40.850 8.187 52 D test/ubiquitin

107.793 4.045 174.870 45.170 9999.000 9.567 53 G test/ubiquitin

120.183 4.695 175.350 54.390 32.650 7.288 54 R test/ubiquitin

109.533 5.508 176.560 59.690 72.260 8.799 55 T test/ubiquitin

119.053 4.060 180.810 58.710 40.370 8.176 56 L test/ubiquitin

114.463 4.370 178.310 61.080 62.530 8.585 57 S test/ubiquitin

125.323 4.296 177.400 57.180 40.100 7.654 58 D test/ubiquitin

116.642 4.670 174.700 58.250 40.070 7.124 59 Y test/ubiquitin

117.033 4.355 174.341 54.120 37.410 8.329 60 N test/ubiquitin

119.733 3.393 174.610 62.420 36.740 6.970 61 I test/ubiquitin

125.874 4.506 175.970 53.660 31.650 7.647 62 Q test/ubiquitin

121.433 4.001 175.810 57.791 32.649 8.487 63 K test/ubiquitin

115.083 3.465 175.250 57.890 25.900 9.591 64 E test/ubiquitin

115.863 4.640 172.160 60.890 64.910 7.383 65 S test/ubiquitin

118.242 5.614 173.950 62.340 70.080 8.737 66 T test/ubiquitin

128.243 5.060 175.770 53.900 44.260 9.801 67 L test/ubiquitin

119.513 5.292 173.150 55.000 30.531 9.633 68 H test/ubiquitin

125.592 5.282 175.270 53.890 44.380 8.533 69 L test/ubiquitin

128.073 4.351 173.999 60.800 34.910 9.490 70 V test/ubiquitin

124.262 5.361 177.830 53.940 42.851 8.067 71 L test/ubiquitin

124.244 4.921 174.953 54.777 32.225 9.169 72 R test/ubiquitin

128.176 4.628 176.270 54.090 42.511 8.881 73 L test/ubiquitin

124.343 4.706 175.048 54.919 31.176 8.588 74 R test/ubiquitin

112.599 4.156 173.001 44.721 9999.000 8.348 75 G test/ubiquitin

Running time: 20.343 seconds

>>/cygdrive/d/spartainstall/SPARTA#

Using a standard MS shell with the infotech drive mounted ..

src\sparta -in test\ubiquitin.pdb ...

Original text --

Contact: shenyang@niddk.nih.gov; bax@nih.gov
Web: http://spin.niddk.nih.gov/bax

DOWNLOAD

RedHat Linux /Fedora Core version
Win32 version

The download unix archive can be unpacked with a command like the following:

   zcat sparta.linux.tar.Z | tar xvf -

The win32 archive can be unpacked with a traditional Windows zip software.

Users are encouraged to email the author to be informed about updates and related software.

What is SPARTA?
Reliability of SPARTA
Components of the SPARTA Package
How to Use SPARTA
Preparing the PDB Coordinates

Adding New Proteins to the Database
Compile the Source Code
About the Name SPARTA

What is SPARTA?

The idea behind SPARTA is that if one can find some triplet of residues in a protein of known structure with similar structure and sequence to a triplet in a target protein, then the backbone secondary chemical shifts for this protein will be useful predictors for the backbone secondary chemical shifts in the target.

Reliability of SPARTA

Components of the SPARTA Package

command-line argument or simply typing in the executive files or starting script without any command-line arguments.

Running SPARTA requires definition of the environment variables "SPARTA_DIR"; this will be established automatically by the starting script ("$SPARTA_DIR/sparta" in Linux):

setenv SPARTA_DIR /disk1/SPARTA

$SPARTA_DIR/src/SPARTA $argv[1-$#argv]

Note that the default "$SPARTA_DIR" is the current directory if not specified.

Other files of the SPARTA package include:

$SPARTA_DIR/tab/sparta.tab
The compiled database of residue triplets with their corresponding PHI/PSI/CHI1 angles and secondary shifts.

$SPARTA_DIR/tab/randcoil.tab, rcadj.tab, rcprev.tab, rcnext.tab
The table of random coil shifts, adjustments values from neighboring residues used in the shifts prediction process. (The same tables as used in TALOS, http://spin.niddk.nih.gov/NMRPipe/talos/)

$SPARTA_DIR/tab/homology.tab
The residue type homology factors used in the prediction process, which is similar to the table used by TALOS.

$SPARTA_DIR.tab/weight.tab
The weighting factors of PHI, PSI and CHI1 angles, and residue type homology used in the prediction process.

$SPARTA_DIR.tab/fitting.tab
The fitting parameters between prediction accuracy and precision, which will be used after the prediction process to calculate the estimated prediction error.

$SPARTA_DIR/shifts/*.tab
The files in this directory are only used when compiling a new database. When compiling a new database, only shift tables ending with the ".tab" extension will be used. The files in this directory are the chemical shift tables for the proteins in the database, which are in the same format as the TALOS shifts tables and must be exactly consistent with the corresponding structures in the SPARTA pdb directory.

$SPARTA_DIR/pdb/*.pdb
The PDB coordinates files in this directory are only used along with the files in the SPARTA shifts directory when compiling a new database (e.g. adding new proteins into the database). The sequence and residue numbering must be exactly consistent with the corresponding assignments in the SPARTA shifts directory. Furthermore, the names of these files must be exactly consistent with the corresponding chemical shift tables in the SPARTA shifts directory.

$SPARTA_DIR/test/*
The contents of this "test" directory include the input files and results for a sample SPARTA analysis.

How to Use SPARTA

Use of SPARTA to predict backbone chemical shifts involves the following steps:

Create a directory for the prediction session; all subsequent commands will be executed from this directory.
Prepare the input PDB coordinate file (for example "protein.pdb"), according to the format given above.
Run SPARTA ("$SPARTA_DIR/src/SPARTA" or "$SPARTA_DIR/sparta" in Linux, "$SPARTA_DIR/src/SPARTA.exe" in Windows) to perform the database searches. Most commonly, this will simply require a command such as:
```
sparta -in protein.pdb
```
SPARTA will first generate an input "pred/protein_in.tab" file from PDB coordinates, which contains of the phi, psi, chi1 angles, H-bonding information and ring current shifts. During the database search, a series of files "pred/X/res*.tab" (X = N, H, HA, CA, CB and C) will be created. Each one of these files tallies the 20 best database matches for a given residue in the target protein. Before exiting, a file "pred.tab" will also be created in "pred" directory, which includes a summary of the prediction results. The database search will typically take about 25 sec for a 100-residue protein on a Linux PC with a 2.8GHz CPU.
If experimental chemical shifts for target protein are available (with a name "ref.tab", for example, and the same format as typical TALOS shift table file, http://spin.niddk.nih.gov/NMRPipe/talos/), SAELDI prediction can be performed by a command such as:
```
sparta -in protein.pdb -ref ref.tab
```
SPARTA would compare the predicted chemical shifts and experimental shifts before exiting, and a prediction summary file "pred/pred.tab" will be generated to store the comparison between the reference and predicted shifts, as well as the errors. If the average prediction error larger than 3 times of the expected errors (standard deviation of prediction errors / square root of number of shifts), a warning is printed and a reference correction will be applied to the experimental chemical shifts. The corrected reference chemical shifts are stored into a new file "pred/ref.tab"

Preparing the Input PDB Coordinates

Examples of the required PDB coordinate format can be found in the "$SPARTA_DIR/pdb" and

"$SPARTA_DIR/test" directories.

Adding New Proteins to the Database

The chemical shifts assignments for each candidate protein are better validated by conducting a SPARTA shift prediction using its PDB coordinates.
```
sparta -in protein.pdb -ref ref.tab
```
Check the prediction summary table (pred/pred.tab) files, remove the experimental shifts for which the predicted shifts deviated five standard deviations. Notably, HAs, for which ring current shifts are > 1.5ppm and the predicted shifts deviate the three standard deviations, are better removed.
Chemical shifts shoule be referenced correctly. A quick check can be conduct by runing above SPARTA prediction for this protein and inspecting the average SPARTA prediction errors, which are listed in the header of prediction summary table (pred/pred.tab). By default, SPARTA will apply a shift referencing correction if the average prediction error is larger than 3 times expected error (i.e., standard deviation of prediction errors / square root of number of shifts), and store the corrected shifts in a file "pred/ref.tab"

Given this, the procedure for adding new proteins to the SPARTA database is simple as:

Create a chemical shift table for the new protein according to the TALOS format (http://spin.niddk.nih.gov/NMRPipe/talos/). Copy the table to the "$SPARTA_DIR/shifts" directory; it must have a ".tab" extension in order to be used.
Place the corresponding PDB structure file into the "$SPARTA_DIR/pdb" directory; it must have a ".pdb" extension, and its file name, sequence, and residue numbering must correspond exactly with the shift table.
Prepare a table file, for example with a name of "list.tab", which only contains the names of proteins to be added into the database. This table must follow the example below:
```
VARS   PDB_NAME
FORMAT %24s
bpti
ubiquitin
profilin
...
```
Note that the "PDB_NAME" in the table file must consistent with the files names (with ".tab"
and ".pdb" extension) in the SPARTA pdb and shifts directories.
In the "SPARTA" directory, execute the following command to compile a new database:
```
sparta -compile -pdbDir ./pdb -pdbList list.tab
```
A new database "$SPARTA_DIR/tab/sparta.tab" will be generated from the files in SPARTA pdb and shifts directories. Please backup the old database, which will be overwritten.

Compile the Source Code

cd $SPARTA_DIR/src
make

About the Name SPARTA

[ Home ] [ NIH ] [ NIDDK ] [ Disclaimer ] [ Copyright ]

last updated: Apr 2007 / Webmaster

Revision 104 Feb 2008 - Main.DavidCowburn

View topic | History: r10 < r9 < r8 < r7 | More topic actions...

Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding this intranet, Send feedback