AminoFAT
From DrugPedia: A Wikipedia for Drug discovery
(→Important Software) |
|||
(18 intermediate revisions not shown.) | |||
Line 4: | Line 4: | ||
==Important Software== | ==Important Software== | ||
- | [http://swift.cmbi.kun.nl/gv/dssp/ DSSP]: For assigning secondary structure of proteins from PDB | + | [http://swift.cmbi.kun.nl/gv/dssp/ DSSP]: For assigning secondary structure of proteins from PDB [[Secondary Structure Assignment]] |
+ | |||
[http://code.google.com/p/pdb-tools/ pdb-tools]: A set of tools for manipulating and doing calculations on wwPDB macromolecule structure files | [http://code.google.com/p/pdb-tools/ pdb-tools]: A set of tools for manipulating and doing calculations on wwPDB macromolecule structure files | ||
- | [http://www.csb.yale.edu/userguides/datamanip/hbplus/hbplus_descrip.html HBPLUS]: is a hydrogen bond calculation program | + | |
+ | [http://www.csb.yale.edu/userguides/datamanip/hbplus/hbplus_descrip.html HBPLUS]: is a hydrogen bond calculation program [[hbblus]] | ||
+ | |||
+ | [http://www.bioinf.manchester.ac.uk/naccess/ NACCESS]: A program for calculating accessible area | ||
+ | |||
+ | [http://www.ebi.ac.uk/pdbsum/ PDBsum]: Summary of protein | ||
+ | |||
+ | [http://www.msi.umn.edu/software/promotif/document_2.html Promotif] : A program fro assigning irregular secondary structure [[Promotif]] | ||
+ | |||
+ | [http://bip.weizmann.ac.il/oca-bin/lpccsu/ LPC]: Ligan protein contact prediction (Installed) | ||
+ | |||
+ | [http://bioinformatics.charite.de/supersite SuperSite]: dictionary of metabolite and drug binding sites in proteins | ||
+ | |||
+ | [http://www.ks.uiuc.edu/Development/MDTools/pdbcat/ PDBcat]: Simple program to read columns (Installed) | ||
+ | |||
+ | [http://www.ebi.ac.uk/thornton-srv/software/SMSD/ SMSD]: Small Molecule Subgraph Detector | ||
+ | |||
+ | [http://swift.cmbi.kun.nl/gv/pdbfinder/ PDBFINDER2] : Combines PDB, DSSP, HSSP | ||
+ | |||
+ | [http://www.ra.cs.uni-tuebingen.de/software/joelib/ JOElib]: open source computational chemistry package written in Java | ||
+ | |||
+ | |||
+ | **[http://fpocket.sourceforge.net/ FPOCKET] : Fpocket: An open source platform for ligand pocket detection | ||
+ | |||
+ | |||
+ | |||
+ | [http://comp.chem.nottingham.ac.uk/download/tmacc/ TMACC]: Topological Maximum Cross Correlation descriptors | ||
+ | |||
+ | [http://www.perlmol.org/ PerlMol]: Perl Modules for Molecular Chemistry | ||
+ | |||
+ | [http://comp.chem.nottingham.ac.uk/parsepdb/ ParsePDB]: A Perl Parser for PDB Files | ||
+ | |||
+ | [http://openmopac.net/ MOPAC]: Molecular Orbital PACkage | ||
+ | |||
+ | [http://bioinfo.tg.fh-giessen.de/pdbselect/ PDBselect] creating non-redundant datasets | ||
+ | |||
+ | == Questions we wish to address on PDB file == | ||
+ | Assigning secondary structure in a PDB file using dssp | ||
+ | |||
+ | Assigning turns in PDB | ||
+ | |||
+ | PDB have highest/lowest composition of a particular residue type | ||
+ | |||
+ | PDB files having highest/lowest types of residues (charged, polar, hydrophobicity) | ||
+ | RNA interacting residues | ||
+ | |||
+ | DNA interacting residues | ||
+ | |||
+ | Protein/peptides interacting residues | ||
+ | |||
+ | Protein-small molecules interaction | ||
+ | |||
+ | Protein-carbohydrate interacting residues | ||
+ | |||
+ | Post translation modification | ||
+ | |||
+ | Disordered regions in a protein | ||
+ | |||
+ | Create dataset from PDB_IDs (Sequence, Structure) | ||
+ | |||
+ | Create non-redundant dataset from CD-HIT , BlastCluster | ||
+ | |||
+ | More about your PDBid (Like link to PDB, PDBwiki, Topsan, protopedia) | ||
+ | |||
+ | Extract PDBids from PDB which satisfy particular criteria (R < 2.5, X-ray, ATP binder, GTP binder) | ||
+ | |||
+ | Filter PDBids supplied by user which satisfy particular condition | ||
+ | |||
+ | Database of D-amino acids | ||
+ | |||
+ | == Plan == | ||
+ | 1. Create a comprehensive file PDB Detail, for each PDBid (like PDB finder), it should include all information about PDB. | ||
+ | |||
+ | 2. Create few MySql tables to store general information about each PDB | ||
+ | a. General Table (PDBid, total chains, chainids, organism, resolution, X-ray/NMR, seq length, hetatms etc. | ||
+ | b. Table of Structure (PDBid, number of residues in helix, beta strand, DSSP states, beta-turns, gamma-turns, cho-PI interaction, hydrogen bond etc, total expose/burried residues | ||
+ | c. Table of Ligand Interactions (PDBid, Major Ligands/Metals (ATP, GTP, NAD …) | ||
+ | d. Table of DNA/RNA/protein interacting residues | ||
+ | 3. File formats, we will maintain two types of formats | ||
+ | a. PDBsfasta format in this format we will provide detail information to user, this will be format for our output files. Our main file “PDB Detail” will also be maintain information about each PDB chain in this format. It will look like this | ||
+ | >ChainID::Seq::A,R,G,T,C,L, (amino acid sequence separated by comma | ||
+ | ChainID::DSSP::H,H,H,H,C,C, (DSSP secondary structure) | ||
+ | ChainID::Dih_phi::120,130,180 (Dihedral Angles, DSSP) | ||
+ | ChainID::ATP_int::0,1,0,0,,1, (ATP interacting residues) | ||
+ | ….. | ||
+ | b. PDBchain format, in this format we will store ChainID of PDB, each chain will be separated by comma, each row will contain 10 chainIDs. It will look like below | ||
+ | 2mltA,2mltB,2pol ,3qtxA, ….. | ||
+ | …… | ||
+ | |||
+ | |||
+ | 4. Creation of data set will generate file in PDBchain format, combination of various datasets will also generate dataset in PDBchain format | ||
+ | 5. Extract Sequence will allow extracting comprehensive information from PDB detail file for a file in PDBchain format and will create a PDBsfasta. | ||
+ | 6. PDBsfasta to PDBchain conversion | ||
+ | 7. Combination of sets form will allow us to create new set of chainids using intersection, union, difference (A-B or B-A) | ||
+ | 8. We will have following type of forms I) Composition Analysis of sequences (e.g. Composition, Dipeptide Composition, Splitted Composition etc.) in desired format like SVM, graphics; ii) Statistics iii) Structure analysis (Composition of helix, sheets); | ||
+ | 9. Generate SVM patterns from PDBsfasta file | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | PDBFINDER2 + PDBSUM + DSSP + ParsePDB + pdb-tools + LIGIN + FPOCKET2 + gramm + HBX + LPC + Mypresto + promotif (turns) + surfnet | ||
+ | |||
+ | |||
+ | |||
+ | [[Category:aminofat]] |
Current revision
Contents |
[edit] AminoFAT: Functional Annotation of Amino Acids
This page maintain software or databases important for predicting functional properties of amino acids in a protein.
[edit] Important Software
DSSP: For assigning secondary structure of proteins from PDB Secondary Structure Assignment
pdb-tools: A set of tools for manipulating and doing calculations on wwPDB macromolecule structure files
HBPLUS: is a hydrogen bond calculation program hbblus
NACCESS: A program for calculating accessible area
PDBsum: Summary of protein
Promotif : A program fro assigning irregular secondary structure Promotif
LPC: Ligan protein contact prediction (Installed)
SuperSite: dictionary of metabolite and drug binding sites in proteins
PDBcat: Simple program to read columns (Installed)
SMSD: Small Molecule Subgraph Detector
PDBFINDER2 : Combines PDB, DSSP, HSSP
JOElib: open source computational chemistry package written in Java
- FPOCKET : Fpocket: An open source platform for ligand pocket detection
TMACC: Topological Maximum Cross Correlation descriptors
PerlMol: Perl Modules for Molecular Chemistry
ParsePDB: A Perl Parser for PDB Files
MOPAC: Molecular Orbital PACkage
PDBselect creating non-redundant datasets
[edit] Questions we wish to address on PDB file
Assigning secondary structure in a PDB file using dssp
Assigning turns in PDB
PDB have highest/lowest composition of a particular residue type
PDB files having highest/lowest types of residues (charged, polar, hydrophobicity) RNA interacting residues
DNA interacting residues
Protein/peptides interacting residues
Protein-small molecules interaction
Protein-carbohydrate interacting residues
Post translation modification
Disordered regions in a protein
Create dataset from PDB_IDs (Sequence, Structure)
Create non-redundant dataset from CD-HIT , BlastCluster
More about your PDBid (Like link to PDB, PDBwiki, Topsan, protopedia)
Extract PDBids from PDB which satisfy particular criteria (R < 2.5, X-ray, ATP binder, GTP binder)
Filter PDBids supplied by user which satisfy particular condition
Database of D-amino acids
[edit] Plan
1. Create a comprehensive file PDB Detail, for each PDBid (like PDB finder), it should include all information about PDB.
2. Create few MySql tables to store general information about each PDB a. General Table (PDBid, total chains, chainids, organism, resolution, X-ray/NMR, seq length, hetatms etc. b. Table of Structure (PDBid, number of residues in helix, beta strand, DSSP states, beta-turns, gamma-turns, cho-PI interaction, hydrogen bond etc, total expose/burried residues c. Table of Ligand Interactions (PDBid, Major Ligands/Metals (ATP, GTP, NAD …) d. Table of DNA/RNA/protein interacting residues 3. File formats, we will maintain two types of formats a. PDBsfasta format in this format we will provide detail information to user, this will be format for our output files. Our main file “PDB Detail” will also be maintain information about each PDB chain in this format. It will look like this >ChainID::Seq::A,R,G,T,C,L, (amino acid sequence separated by comma ChainID::DSSP::H,H,H,H,C,C, (DSSP secondary structure) ChainID::Dih_phi::120,130,180 (Dihedral Angles, DSSP) ChainID::ATP_int::0,1,0,0,,1, (ATP interacting residues) ….. b. PDBchain format, in this format we will store ChainID of PDB, each chain will be separated by comma, each row will contain 10 chainIDs. It will look like below 2mltA,2mltB,2pol ,3qtxA, ….. ……
4. Creation of data set will generate file in PDBchain format, combination of various datasets will also generate dataset in PDBchain format
5. Extract Sequence will allow extracting comprehensive information from PDB detail file for a file in PDBchain format and will create a PDBsfasta.
6. PDBsfasta to PDBchain conversion
7. Combination of sets form will allow us to create new set of chainids using intersection, union, difference (A-B or B-A)
8. We will have following type of forms I) Composition Analysis of sequences (e.g. Composition, Dipeptide Composition, Splitted Composition etc.) in desired format like SVM, graphics; ii) Statistics iii) Structure analysis (Composition of helix, sheets);
9. Generate SVM patterns from PDBsfasta file
PDBFINDER2 + PDBSUM + DSSP + ParsePDB + pdb-tools + LIGIN + FPOCKET2 + gramm + HBX + LPC + Mypresto + promotif (turns) + surfnet