Datasets in Bioinformatics

From DrugPedia: A Wikipedia for Drug discovery

Revision as of 11:21, 3 September 2008 by Ravi (Talk | contribs)
Jump to: navigation, search

There are a number of Datasets that are being created and used in the field of Bioinformatics. Datasets contains the vital information based on which a prediction server depends for it's function. here is some of the datasets that are being created or used by Bioinformatics Centre, Institute of Microbiology, Chandigarh are as follows :

Contents

Datasets for evaluation of beta turn prediction method

The dataset has 426 non-homologus protein chains. In this data set, no two protein chains have more than 25% sequence identity.The structure of these proteins is determined by X-ray crystallography at 2.0 resolution or better. Each chain contains minimum one beta turn.

Complete Dataset

  • Amino acid sequence of 426 protein chains in fasta format


ProPred-I

The Promiscuous MHC Class-I Binding Peptide Prediction Server

The ProPred-I is an on-line service for identifying the MHC Class-I binding regions in antigens. It implements matrices for 47 MHC Class-I alleles, proteasomal and immunoproteasomal models. The main aim of this server is to help users in identifying the promiscuous regions.

Dataset

Here is two datasets that are used in developing this server is :

HLA-A*0201

H2-kb


Matrix Optimization Technique for Predicting MHC binding Core

The X-ray crystal structure of MHC class II molecule has revealed open peptide binding groove. A peptide bound in this groove may flank from one or the other side. Understanding which residues are acctually involved in binding will be very useful for understanding MHC peptide interactios.Here Matrix Optimization Technique is used to predict MHC binding core. Using binders from MHCPEP and nonbinder Data with MOT an accuracy of correct classification from 97 to 99% was obtained with HLA-DR1, HLA-DR2 and HLA-DR5 allele. This is the highest accuracy reported by any method. The prediction method used in this server is based on MOT and relies on the thought that binders have unique patterns which can be easily distinguished from nonbinders.

Dataset

The "Binder" used in this study :

HLA-DR1

HLA-DR2

HLA-DR5

The "Non-binder" used in this study are :

HLA-DR1

HLA-DR2

HLA-DR


Bcepred: Prediction of linear B-cell epitopes, using physico-chemical properties

We evaluated the performance of existing linear B-cell epitope prediction methods based on physico-chemical properties on a non-redundant dataset. The dataset consists of 1029 B-cell epitopes obtained from Bcipep database and equally number of non-epitopes obtained randomly from Swiss-Prot database.

Data set

B-cell epitopes were obtained from B cell epitope database BCIPEP, which contains 2479 continuous epitopes, including 654 immunodominant, 1617 immunogenic epitopes. All the identical epitopes and non-immunogenic peptides were removed, finally we got 1029 unique experimentally proved continuous B cell epitopes. The dataset covers a wide range of pathogenic group like virus, bacteria, protozoa and fungi. Final dataset consists of 1029 B-cell epitopes and 1029 non-epitopes or random peptides (equal length and same frequency generated from SWISS-PROT).