PSLpred
From DrugPedia: A Wikipedia for Drug discovery
A Method to predict Bacterial subcellular Localizations
Prokaryotes (Gram-negative bacteria) have 5 major subcellular localizations (outer membrane, inner membrane, periplasm, cytoplasm, and extracellular), specialized in distinct biochemical process. Hence, PSLpred a SVM based method has been devloped for the prediction of subcellular localization of prokaryotic proteins using input features such as amino acid and dipeptide composition, physico-chemical properties along with similarity search based results. The server is available at www.imtech.res.in/raghava/pslpred
[edit] Strategies used to devlop PSLpred Algorithm
The data set used in the present work was same as used by Yu et al (2004) for developing the method CELLO. The data set was generated from SWISS- PROT release 40.29, consisted of a total of 1443 proteins, 1302 localized in single subcellular site and 141 proteins resident at multiple locations. However, for devloping PSLpred, 141 proteins residing in more then one subcellular location were excluded and 1302 proteins (248 cytoplasmic, 268 inner membrane, 244 periplasmic, 352 outher membrane, and 190 extracellular) having single subcellular localization were used for the prediction of subcellular localization of prokaryotic proteins.
i) Amino acid composition A SVM module developed on the basis of amino acid composition in a protein has achieved best results with the RBF kernel (g=100, c=2, j=1). The calculation of amino acid composition generates the 20 dimensional input vectors for each protein sequence which were used to train five types of SVM models for the five types of subcellular localizations. The composition based SVM module was predicted with an overall accuracy of 86%.
ii) Dipeptide composition The dipeptide composition based SVM module encompasses the information about amino acid composition along local order of amino acid.It uses the fixed pattern length of a vector with 400 dimensions. The dipeptide composition based SVM module with the RBF kernel (g=300, C=2) was predicted with an overall accuracy of 86%.
iii) Composition of physicochemical properties The calculation of composition of physico-chemical properties of the protein sequences generates input vector of 33 dimensions for each sequence. The overall accuracy of properties based SVM module is 83%,~3% lesser then amino acid composition based SVM module.
iv) Similarity-search based module The performance of the PSI-BLAST based module was also evaluated through 5-fold cross-validation. The performance of this module is poorer as compared to other modules developed in the present study. The SVM module based on this approach was able to predict the subcellular localization of the proteins with overall accuracy of 68%.
It encompassed the information about composition and similarity search based module and achieved an overall accuracy of 91.2% (g=25, C=4) , which is 5-8% higher than individual compositions based modules. It proves hybrid module is able to encapsulate more information, which successfully improves the reliability of prediction accuracy. These results confirmed that detection of subcellular localization of proteins requires wide range of information about a protein.
Subcellular localization | Accuracy (%) | MCC |
Cytoplasmic | 90.7 | 0.86 |
Extracellular | 86.8 | 0.88 |
Inner-membrane | 90.3 | 0.90 |
Outer-membrane | 95.2 | 0.95 |
Periplasmic | 90.6 | 0.89 |
In order to confirm the prediction reliability, RI assignment was carried out for the hybrid module and 90% and 98.1% of accuracy was obtained with RI=4 and 5 respectively. It has also been observed that ~74% of the sequences have RI=5. Hence, the present method can predict subcellular localization of prokaryotic proteins more reliably.