Version 2.0
May 1996
refers to the entry at the first
row and the first column. In general,
refers to the
entry at the ith row and the jth column.
To use this for sequence alignment, we simply associate a numeric
value to each letter in the alphabet of the sequence. For example,
if the alphabet is then A = 1, C = 2, etc. Thus, one would find the score for a match between A and C at![]()
.
Since we consider different scoring matrices in this section,
we distinguish between them by using different letters for the matrix,
refers to the Replacement matrix,
to the log odds matric, and so on.
A T C G A 1 0 0 0 T 0 1 0 0 C 0 0 1 0 G 0 0 0 1
For elements in row i by column j:
,
,
A T C G A 5 -4 -4 -4 T -4 5 -4 -4 C -4 -4 5 -4 G -4 -4 -4 5
A T C G A 0 5 5 1 T 5 0 1 5 C 5 1 0 5 G 1 5 5 0
Nucleotide bases fall into two categories depending on the ring structure of the base. Purines (Adenine and Guanine) are two ring bases, pyrimidines (Cytosine and Thymine) are single ring bases. Mutations in DNA are changes in which one base is replaced by another. A mutation that conserves the ring number is called a transition (e.g., A -> G or C -> T) a mutation that changes the ring number are called transversions. (e.g. A -> C or A -> T and so on).
Although there are more ways to create a transversion, the number of transitions observed to occur in nature (i.e., when comparing related DNA sequences) is much greater. Since the likelihood of transitions is greater, it is sometimes desireable to create a weight matrix which takes this propensity into account when comparing two DNA sequences.
Use of a Transition/Transversion Matrix reduces noise in comparisons of distantly related sequences.
,
,

are the frequencies that residue i and j are observed to align
in sequences known to be related. They are derived from a
"transition probability matrix."
and
are frequencies of occurrence of
residue i and j in the
set of sequences.
is the number of times amino
acid j was replaced by amino acid i in all comparisons).
, i.e., the propensity of
a given amino acid, j, to be replaced.


Divide each element of the Mutation Data Matrix, M, by the frequency
of occurance of each residue:

R is a Relatedness Odds Matrix,
is the frequency of residue i.
The Log Odds Matrix,
, is calculated from the relatedness odds matrix,
,
simply by taking the log of each
.
for any column, j, is one (trivial). Note that the probability
that an amino acid will change is on the order of 1% for each amino acid. The
probability that it will stay the same is on the order ot 99% for each amino
acid.
= 1,2,3... according
to the formulae below, followed by operation on a sequence:


the above equation enables the direct calculation of a matrix
for any desired PAM distance.
and 0 for
;
PAM matrix elements approaches the asymptotic amino acid
composition.
=
/(total number of residue pairs)
David T. Jones, William R. Taylor and Janet M. Thornton
(1992). The rapid generation of mutation data matrices from
protein sequences. CABIOS 8: 275-282.
An update to the PAM matrix using the method of Dayhoff. 59,190 accepted mutations in 16,130 sequences were tallied.
Gaston H. Gonnet, Mark A. Cohen, Steven A. Benner
(1992). Exhaustive Matching of the Entire Protein Sequence
Database. Science 256: 1443-1445
The answer to life the universe and everything. A scoringBack to Table of Contents.
matrix based on alignment of the entire SWISS-PROT data base. 1.7 x 10^6 matches were used from sequences differing by 6.4 to 100.0 PAM.
PAM Min. significant distance H (bits) length (30 bits) 0 4-17 8 10 3-43 9 20 2-95 11 30 2-57 12 40 2-26 14 50 2-00 15 60 1-79 17 70 1-60 19 80 1-44 21 90 1-30 24 100 1-18 26 110 1-08 28 120 0-08 31 130 0-90 34 140 0-82 37 150 0-70 40 160 0-70 43 170 0-65 47 180 0-60 51 190 0-55 55 200 0-51 59 210 0-48 63 220 0-45 68 230 0-42 73 240 0-39 78 250 0-36 83 260 0-34 89 270 0-32 91 280 0-30 100 290 0-28 107 300 0-27 113 310 0-25 120 320 0-24 127 330 0-22 134 340 0-21 141 350 0-20 149
Back to Table of Contents.
Creation of a step-matrix based solely on aligned blocks of G-Protein-Coupled Receptors in which the elements of the matrix are proportional to the rarity of the substitution.
Used to excellent advantage in constructing a coherent phylogeny of widely diverged G-protein receptors.
Matrices for detecting frame shift mutations giving rise to new coding sequences (or arising from sequencing error).
Scoring matrix created from observed substitutions of residues found in similar structural environments in 3D.
Back to VSNS BioComputing Division Home Page
VSNS-BCD Copyright
David Wheeler
Thanks to Karen R. Lafollette and Dr Paula Burch for technical assistance, and to A. Guffanti for manuscript preparation.