PAM Scoring Matrices
From DrugPedia: A Wikipedia for Drug discovery
[edit] THE GENERAL MATHEMATICAL SETUP OF PAM SCORING MATRICES
PAM is Point Accepted Mutation.
PAM scoring matrix mainly scores for protein sequences alignment. These matrices are based on global alignments of closely related proteins.
Protein sequence is amino acid sequence, and nature shows that their relative replace-ability have many impacts in an evolutionary scenario. Therefore, PAM is the substitution of one amino acid of a protein by another that is accepted and permitted biologically and spreading to essentially some given entire species over time of evolutions. Therefore it has more to do with the study of homology between protein sequences tracing back to their common ancestors. A PAM1 probability transition matrix is the [Markov chain] [1]matrix applying for a time period over which we expect 1% divergence or 1% of the amino acids to undergo accepted point mutations within the species of interest.
PAM matrices were derived on the basis of 71 blocks of aligned, ungapped amino acid sequences. These blocks are conserved sequences sharing at least 85% of similarity. We concentrate on PAM1 which is the basic substitution(transition) 20 × 20 matrix from where other higher PAM units eg. PAM20,PAM250,etc. are extrapolated.
The Requirements:
a) given a list of accepted mutations or a hypothetical phylogenetic trees
b) all the 20 amino acids forming Y- row and X-column
c) the probability of occurrence P(a) for each amino acid 'a'
∑ P(a)= 1 ᵃ
Let f(ab) = the number of times the mutation a ↔ b was observed to occur.
And also f(ab)= f(ba) (not directional).
Then,
the total number of mutations in which 'a' was involved is
f(a)= ∑ f(ab) b≠a the total number of amino acid occurrences involved in mutations. f= ∑ f(a) a here number f is also twice the total number of mutations.
The matrix-M element is M(ab) is the probability of amino acid 'a' changing into amino acid 'b'. M(aa) is probability to be unchanged for certain amino acid 'a' during the evolutionary interval.
Relative mutability of amino acid 'a' defined as
f(a) m(a)= ―― 100* f *P(a)
Mutabilities are scaled to the number of replacements per occurrence of the given amino acid per 100 residues in each alignment. Relative mutability is the probability that the given amino acid will change in the evolutionary period of interest. Hence, the probability of a remaining unchanged is the complementary probability
M(aa) = 1 − m(a)
On the other hand, the probability of a changing into b can be computed as the product of the conditional probability that a will change into b, given that a changed, times the probability of a changing ,then we have
M(ab) = P (a → b) = P (a → b| a changed)*P (a changed)
f(ab) = ― *m(a) f(a)
we implement Markov-type model of evolution in deriving the above equations, which has good mathematical properties.
The element M has the following properties :
1) ∑ M(ab)= 1 b because, Σ M(ab)= M(aa) + ∑ m(a) f(ab)/f(a) = 1 – m(a)+ ∑ m(a) *f(ab)/f(a) b b≠a b≠a = 1 – m(a)+ m(a) = 1
2) n-PAM model has n mutation steps and transition matrix for this model is just n times multiplications of 1-PAM matrix.
Lets continue on with 1-PAM matrix and define the scoring matrix-S. The entries in this matrix are related to the ratio between two probabilities, i.e, the odds ratio ~M(ab)/P(b)
Each entry of the matrix-S is calculated as log of odds :
S(ab)= log{ M(ab)/P(b)} we can have the log base of our choice
the score for an alignment is thus given by:
S = ∑ S ( a a)
ON PAM'S ?
PAM matrix is used extensively in BLAST Search algorithm, which is extremely fast, robust and popular heuristic. There is a whole family of matrices: PAM-10, ..., PAM-250, ... these matrices are extrapolated from PAM-1 matrix (by matrix multiplication) A PAM is a relative measure of evolutionary distance eg. :
a) 1 PAM = 1 accepted mutation per 100 amino acids
b) 250 PAM = 2.5 accepted mutations per amino acid
EXTRA....!!
The other commonly used types of scoring matrices are the BLOSUM matrices.Contrary to the PAM matrices that've been developed from global alignments, the BLOSUM (BLOcks SUbstitution Matrix) matrices are based on local multiple alignments of more distantly related sequences. For instance, BLOSUM 62, the default matrix in BLAST, is a matrix calculated from comparisons of sequences with no less than 62% identity. Unlike PAM matrices, new BLOSUM matrices are never extrapolated from existing BLOSUM matrices, but are always based on local multiple alignments. So, the BLOSUM 80 matrix would be derived from a set of sequences having 80% sequence identity.
REFERENCES :
[1] Joao Setubal and Joao Meidanis, Introduction to computational molecular biology, University of Campinas, Brazil, December 1997.
[2] Warren J. Ewens and Gregory R. Grant, Statistical methods in bioinformatics: an introduction, Springer-Verlag New York, 2001.
[3] Heniko JG. Heniko S. Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A.,pages 89(22):10915-9,1992
[4] Dayho MO Schwartz RM. Atlas of Protein Sequence and Structure, 5 suppl., volume 3:353-358. Nat. Biomed. Res. Found., Washington D.C., 978.
[5] Bioinformatics: Polanski and kimmel