AminoFAT

From DrugPedia: A Wikipedia for Drug discovery

(Difference between revisions)
Jump to: navigation, search
(Important Software)
Current revision (19:19, 7 October 2010) (edit) (undo)
(Important Software)
 
(4 intermediate revisions not shown.)
Line 4: Line 4:
==Important Software==
==Important Software==
-
[http://swift.cmbi.kun.nl/gv/dssp/ DSSP]: For assigning secondary structure of proteins from PDB
+
[http://swift.cmbi.kun.nl/gv/dssp/ DSSP]: For assigning secondary structure of proteins from PDB [[Secondary Structure Assignment]]
[http://code.google.com/p/pdb-tools/ pdb-tools]: A set of tools for manipulating and doing calculations on wwPDB macromolecule structure files
[http://code.google.com/p/pdb-tools/ pdb-tools]: A set of tools for manipulating and doing calculations on wwPDB macromolecule structure files
-
[http://www.csb.yale.edu/userguides/datamanip/hbplus/hbplus_descrip.html HBPLUS]: is a hydrogen bond calculation program
+
[http://www.csb.yale.edu/userguides/datamanip/hbplus/hbplus_descrip.html HBPLUS]: is a hydrogen bond calculation program [[hbblus]]
[http://www.bioinf.manchester.ac.uk/naccess/ NACCESS]: A program for calculating accessible area
[http://www.bioinf.manchester.ac.uk/naccess/ NACCESS]: A program for calculating accessible area
[http://www.ebi.ac.uk/pdbsum/ PDBsum]: Summary of protein
[http://www.ebi.ac.uk/pdbsum/ PDBsum]: Summary of protein
 +
 +
[http://www.msi.umn.edu/software/promotif/document_2.html Promotif] : A program fro assigning irregular secondary structure [[Promotif]]
[http://bip.weizmann.ac.il/oca-bin/lpccsu/ LPC]: Ligan protein contact prediction (Installed)
[http://bip.weizmann.ac.il/oca-bin/lpccsu/ LPC]: Ligan protein contact prediction (Installed)
Line 74: Line 76:
Database of D-amino acids
Database of D-amino acids
 +
 +
== Plan ==
 +
1. Create a comprehensive file PDB Detail, for each PDBid (like PDB finder), it should include all information about PDB.
 +
 +
2. Create few MySql tables to store general information about each PDB
 +
a. General Table (PDBid, total chains, chainids, organism, resolution, X-ray/NMR, seq length, hetatms etc.
 +
b. Table of Structure (PDBid,  number of residues in helix, beta strand, DSSP states, beta-turns, gamma-turns, cho-PI interaction, hydrogen bond etc, total expose/burried residues
 +
c. Table of Ligand Interactions (PDBid, Major Ligands/Metals (ATP, GTP, NAD …)
 +
d. Table of DNA/RNA/protein interacting residues
 +
3. File formats, we will maintain two types of formats
 +
a. PDBsfasta format in this format we will provide detail information to user, this will be format for our output files. Our main file “PDB Detail” will also be maintain information about each PDB chain in this format. It will look like this
 +
>ChainID::Seq::A,R,G,T,C,L, (amino acid sequence separated by comma
 +
ChainID::DSSP::H,H,H,H,C,C, (DSSP secondary structure)
 +
ChainID::Dih_phi::120,130,180 (Dihedral Angles, DSSP)
 +
ChainID::ATP_int::0,1,0,0,,1, (ATP interacting residues)
 +
…..
 +
b. PDBchain format, in this format we will store ChainID of PDB, each chain will be separated by comma, each row will contain 10 chainIDs. It will look like below
 +
2mltA,2mltB,2pol ,3qtxA, …..
 +
……
 +
 +
 +
4. Creation of data set will generate file in PDBchain format, combination of various datasets will also generate dataset in  PDBchain format
 +
5. Extract Sequence will allow extracting comprehensive information from PDB detail file for a file in PDBchain format and will create a PDBsfasta.
 +
6. PDBsfasta to PDBchain conversion
 +
7. Combination of sets form will allow us to create new set of chainids using intersection, union, difference (A-B or B-A)
 +
8. We will have following type of forms I) Composition Analysis of sequences (e.g. Composition, Dipeptide Composition, Splitted Composition etc.) in desired format like SVM, graphics; ii) Statistics iii) Structure analysis (Composition of helix, sheets);
 +
9. Generate SVM patterns from PDBsfasta file
 +
 +
 +
 +
 +
 +
PDBFINDER2 + PDBSUM + DSSP + ParsePDB +  pdb-tools + LIGIN +  FPOCKET2 + gramm + HBX +  LPC + Mypresto +  promotif (turns) + surfnet
 +
[[Category:aminofat]]
[[Category:aminofat]]

Current revision

Contents

[edit] AminoFAT: Functional Annotation of Amino Acids

This page maintain software or databases important for predicting functional properties of amino acids in a protein.

[edit] Important Software

DSSP: For assigning secondary structure of proteins from PDB Secondary Structure Assignment

pdb-tools: A set of tools for manipulating and doing calculations on wwPDB macromolecule structure files

HBPLUS: is a hydrogen bond calculation program hbblus

NACCESS: A program for calculating accessible area

PDBsum: Summary of protein

Promotif : A program fro assigning irregular secondary structure Promotif

LPC: Ligan protein contact prediction (Installed)

SuperSite: dictionary of metabolite and drug binding sites in proteins

PDBcat: Simple program to read columns (Installed)

SMSD: Small Molecule Subgraph Detector

PDBFINDER2 : Combines PDB, DSSP, HSSP

JOElib: open source computational chemistry package written in Java


    • FPOCKET : Fpocket: An open source platform for ligand pocket detection


TMACC: Topological Maximum Cross Correlation descriptors

PerlMol: Perl Modules for Molecular Chemistry

ParsePDB: A Perl Parser for PDB Files

MOPAC: Molecular Orbital PACkage

PDBselect creating non-redundant datasets

[edit] Questions we wish to address on PDB file

Assigning secondary structure in a PDB file using dssp

Assigning turns in PDB

PDB have highest/lowest composition of a particular residue type

PDB files having highest/lowest types of residues (charged, polar, hydrophobicity) RNA interacting residues

DNA interacting residues

Protein/peptides interacting residues

Protein-small molecules interaction

Protein-carbohydrate interacting residues

Post translation modification

Disordered regions in a protein

Create dataset from PDB_IDs (Sequence, Structure)

Create non-redundant dataset from CD-HIT , BlastCluster

More about your PDBid (Like link to PDB, PDBwiki, Topsan, protopedia)

Extract PDBids from PDB which satisfy particular criteria (R < 2.5, X-ray, ATP binder, GTP binder)

Filter PDBids supplied by user which satisfy particular condition

Database of D-amino acids

[edit] Plan

1. Create a comprehensive file PDB Detail, for each PDBid (like PDB finder), it should include all information about PDB.

2. Create few MySql tables to store general information about each PDB a. General Table (PDBid, total chains, chainids, organism, resolution, X-ray/NMR, seq length, hetatms etc. b. Table of Structure (PDBid, number of residues in helix, beta strand, DSSP states, beta-turns, gamma-turns, cho-PI interaction, hydrogen bond etc, total expose/burried residues c. Table of Ligand Interactions (PDBid, Major Ligands/Metals (ATP, GTP, NAD …) d. Table of DNA/RNA/protein interacting residues 3. File formats, we will maintain two types of formats a. PDBsfasta format in this format we will provide detail information to user, this will be format for our output files. Our main file “PDB Detail” will also be maintain information about each PDB chain in this format. It will look like this >ChainID::Seq::A,R,G,T,C,L, (amino acid sequence separated by comma ChainID::DSSP::H,H,H,H,C,C, (DSSP secondary structure) ChainID::Dih_phi::120,130,180 (Dihedral Angles, DSSP) ChainID::ATP_int::0,1,0,0,,1, (ATP interacting residues) ….. b. PDBchain format, in this format we will store ChainID of PDB, each chain will be separated by comma, each row will contain 10 chainIDs. It will look like below 2mltA,2mltB,2pol ,3qtxA, ….. ……


4. Creation of data set will generate file in PDBchain format, combination of various datasets will also generate dataset in PDBchain format 5. Extract Sequence will allow extracting comprehensive information from PDB detail file for a file in PDBchain format and will create a PDBsfasta. 6. PDBsfasta to PDBchain conversion 7. Combination of sets form will allow us to create new set of chainids using intersection, union, difference (A-B or B-A) 8. We will have following type of forms I) Composition Analysis of sequences (e.g. Composition, Dipeptide Composition, Splitted Composition etc.) in desired format like SVM, graphics; ii) Statistics iii) Structure analysis (Composition of helix, sheets); 9. Generate SVM patterns from PDBsfasta file



PDBFINDER2 + PDBSUM + DSSP + ParsePDB + pdb-tools + LIGIN + FPOCKET2 + gramm + HBX + LPC + Mypresto + promotif (turns) + surfnet