RebeccaP turns the RebeccaP's Recorder on.
RebeccaP says, "So, how did the reading go of chapter 4?"
Bergeron says, "I have done only up to section 4.6"
RebeccaP says, "Others?"
ChristianF says, "Currently I have only a more technical question. Not directly
dealing with the content of the chapter."
AlexS says, "Up to section 4.6 also..."
RebeccaP [to ChristianF]: Ask away, and if I don't know the answer, I'll see
that it gets answered.
CheeMV finds his way in.
RebeccaP says, "Hi Chee! We're just starting a discussion of Ch 4 with the
open question of "
CheeMV waves
ChristianF [to RebeccaP]: "Ok. How many residues of for example an amino acid
sequence are needed to make a use-/senseful alignment?
RebeccaP [to ChristianF]: How many sequences are you attempting to align?
ChristianF says, "Well, it depends. First of all, I have to make a comparison
to find (a) similar sequence in pdb/swissprot."
ChristianF says, "Would be useful to make a search, followed by an alignment
with, let's say, 7 residues?"
RebeccaP says, "If you are dealing with amino acids, then I think the chance
of a random hit is considered small enough when you get to about 7-10
residues. This is what the critical question is -- how likely is it that I
will align well to something because of random chance>"
RebeccaP says, "However, the problem still exists, so longer sequences are
always "better" from that perspective."
ChristianF says, "Ok. "
ChristianF says, "And how many gaps could be introduced into the sequence
during the alignment?"
ChristianF says, "to still get some statistically significant results?"
ChristianF nods
RebeccaP says, "Clarification question: do you mean how many gaps in the
optimal alignment are allowable before questioning the significance of the
similarity?"
ChristianF says, "Yes."
RebeccaP says, "Ok -- I would say this would likely first be more of a
percentage than a number."
RebeccaP says, "Second, I would also believe that this would depend on the kind
of comparison. If you are looking for conserved regions in otherwise
dissimilar sequences, you'd have to consider the gaps only in the "relevant"
regions. "
ChristianF nods
RebeccaP says, "To my knowledge, there is no formula for computing this number,
other than to say that the total dissimilarity has a confidence interval of
such-and-so that the relationship is not random. This, however, requires a
model of expected base composition."
RebeccaP says, "We use particular models of base composition, but I don't know
how widely accepted they are in the biology/whatever community."
ChristianF says, "Is there some information available concerning these models?"
RebeccaP says, "Hmm. Yes, there is, but I can't spout them off the top of my
head. Let me check a reference I have here..."
RebeccaP says, "Aha! In the book Mathematical Methods for DNA Sequences, CRC
Press, edited by Michael S. Waterman, there is a chapter entitled "Patterns in
DNA and Amino Acid Sequences and Their Statistical Significance". This paper
is at least a place to start (pub date 1989) although it is a little old. "
ChristianF [to RebeccaP]: "Ok, thank you.
RebeccaP says, "Other questions on Ch 4? Is it difficult or easy to follow?"
ChristianF says, "It's ok together with the examples at the end."
AlexS says, "Up to section 4.6 it is ok..."
RebeccaP says, "Anne? Chee?"
RebeccaP [to ChristianF]: How are you using the examples -- to verify
understanding?
CheeMV says, "It's not difficult to follow."
Bergeron is struggling with the definitions in section 4.6
RebeccaP [to Bergeron]: Any particular one?
Bergeron says, "Tij?"
RebeccaP says, "If you think of time and evolution in this tree and you find
the common ancestor for i and j in the evolutionary tree, Tij is the time that
has elapsed since i and j diverge on the tree basically the time from the
common ancestor to i and j."
Bergeron says, "Is there a more mathematical definition?"
RebeccaP says, "Well, there is generally assumed to be some proportional
relationship between distance and time. If one quantifies that relationship,
then it would defined by the distance from the "computed" common ancestor to i
and j."
RebeccaP says, "However, there is no real consensus about how time relates to
distance, since evolutionary change isn't a constant."
Bergeron nods
RebeccaP says, "BTW, I found that the reading became much slower in section 4.6
and stayed quite a bit denser for the rest of the chapter."
Bergeron agrees.
RebeccaP says, "Have any of you used the SplitTree software that is described
in the chapter?"
CheeMV says, "I'm reading section 4.6."
ChristianF shakes his head
AlexS says, "no."
CheeMV says, "No."
RebeccaP says, "Is this kind of analysis anything that you have to do?"
Bergeron says, "no."
RebeccaP says, "I think until you all have read past 4.6 there is little to
really talk about. There are a fair number of formulas. I could describe the
basic idea behind split decomposition, but I don't know how useful that would
be."
RebeccaP says, "Are there remaining questions from multiple alignment? Any
more luck with the example in the final section?"
Bergeron would like to get the basic ideas behind split dec.
RebeccaP says, "Any objections from the rest?"
ChristianF is also curious
AlexS says, "Try to describe them..."
CheeMV has no objection.
RebeccaP says, "Ok. The basic idea here is that we want to find partitions
(non-overlapping sets) of different granularity. Let's start with a 4 element
set with elements A, B, C, and D."
RebeccaP says, "There are several trivial splits: A -- BCD, B -- ACD, etc. "
RebeccaP says, "The goal of a useful split is to find a partition that has the
property that the elements in the same partition are more similar to each other
than they are to elements outside the partition."
RebeccaP says, "This definition pre-supposes some metric of similarity, which
we will use, for now, as our distance metric -- dij."
RebeccaP says, "So, mathematically, we have a d-split into sets A and B if the
following conditions hold for all combinations of pairs of elements in A and B:
"
RebeccaP says, "for pair i and j in A and k and l in B, the combined distance
of i to j and k to l is smaller than either the combined distance of i to k
and j to l or"
RebeccaP says, "the combined distance of i to l and j to k."
RebeccaP says, "Mathematically, this is:"
RebeccaP says, "(dij + dkl) < (dik + djl) and (dij + dkl) < (dil + djk)."
RebeccaP says, "It is additionally required that (dik + djl) = dil + djk."
Bergeron says, "equality?"
RebeccaP says, "Yes. This is where things breakdown sometimes -- for this to
be true the metric must respect what is called the four-point condition."
RebeccaP says, "However, we can sensibly weaken this to simply require that the
(dij+dkl) term is < both individually."
RebeccaP says, "We find, though, that at times, there is so much noise in the
metric that the most we want to require is:"
RebeccaP says, "NOT ((dij + dkl) > (dik + djl) and (dij + dkl) > (dil + djk))"
RebeccaP says, "which in english means that the similarity within the group is
not worse than both of the similarities with other groups."
RebeccaP says, "Does this make sense?"
ChristianF nods
AlexS nods.
CheeMV nods
Bergeron takes it for a definition. Waits for the results.
RebeccaP says, "Ok, so the basic idea is that we use this process as a way to
find a hierarchy of related groups. We can have these clusters at various
levels. We would ideally like to have clusters of various sizes in these
splits."
RebeccaP says, "Clusters of this form give us real information -- clusters
which are either very large or very small are less useful."
RebeccaP says, "Now, given a d-split, we can give it a number called its
isolation index, which characterizes the degree to which the split makes sense
relative to the data."
RebeccaP says, "This is most relevant when we have had to use one of the weaker
forms of the definition, which is often the case in "real" data sets. "
RebeccaP says, "What this isolation index measures, is the amount of noise.
Let's take apart this definition:"
RebeccaP says, "We start by finding the max of (dik +djl) and (dil + djk).
Essentially, we are finding the largest amount of dissimilarity shown by the
split."
Bergeron nods
RebeccaP says, "We subtract from this the degree of similarity within the
clusters: (dij + dkl) is subtracted from this max term."
RebeccaP says, "Then, for all the pairs between the sets, we find the smallest
of these (and multiply it by 1/2). This gives us the isolation index."
RebeccaP says, "Now, once we can compute this value for a given split, we can
analyze the "sensibility" as a whole of the data set (distance information) by
combining this isolation index across all the d-splits."
RebeccaP wonders if Chee, Alex and Christian are still with me...
CheeMV is still listening.
AlexS thinks he is with you...
RebeccaP [to AlexS]: Is there a question you'd like to ask before I continue
with the data-set wide information?
AlexS says, "No question yet."
ChristianF is still listening.
RebeccaP says, "Ok..."
RebeccaP says, "So, we have this isolation indices for each possible split (and
the chapter describes how to enumerate all the splits)."
RebeccaP says, "To characterize the data as a whole, we would then first
compute d^1 as follows:"
RebeccaP says, "first, the notation. The isolation index is a_s (shown as
alpha in the text). this is for a particular split s={A,B}."
RebeccaP says, "we can compute another relation, which I'll call d_s (call
delta in the text)."d_s is a metric which assigns, for split s={A,B}, d_s(i,j)
= 1 if i in A and j in B, and d_s(i,j) = 0 otherwise."
RebeccaP says, "In English, this means that d_s is 0 if the elements in
question are in the same set for this split."
RebeccaP says, "we use this to weight the a_s metric, since we don't care if
the elements are in the same set of a split."
RebeccaP says, "this gives us d^1 = sumoverallsplits a_s * d_s."
RebeccaP says, "Thus, we have d^1 as the composite measure of noise. "
ChristianF nods
RebeccaP says, "Decomposition theory tells us the d^1(i,j) <= d(i,j). In
english, you could say that a pair can't contribute more noise than their
distance."
Bergeron is wondering wath is 'noise' in this case.
ChristianF [to RebeccaP]: "That would mean, the higher the value of d^1, the
bigger the noise?"
RebeccaP [to Bergeron]: It is the same conceptual idea of noise -- how much
this pair contirbutes to the total dissimilarity amount in all the splits.
RebeccaP [to ChristianF]: Actually, I misspoke there. The d^1 is the degree of
correspondence to the data, so you want a larger d^1. Thus, the following is
more correct:
RebeccaP says, "Decomposition theory tells us the d^1(i,j) <= d(i,j). This
means that a pair i,j can't contribute more support than the distance.
Sorry -- I got lost in my notes."
RebeccaP says, "d^0 is the noise contribution, so larger d^0 is more noise."
RebeccaP says, "d^0 = d(u,v) - d^1(u,v)."
RebeccaP says, "Is this clearer now?"
Bergeron says, "Yes"
ChristianF nods
RebeccaP looks at Alex and Chee for acknowledgement...
AlexS nods.
CheeMV nods
RebeccaP says, "Ok good. Now, the chapter gives some guidance for when a data
set is BAD."
RebeccaP says, "The final number computed is called the splittable percentage:"
RebeccaP says, "r_d = ((sumoverallpairs d^1(u,v)) / (sumofallpairs d(u,v)) * 10
0%."
RebeccaP says, "In the text this is rho. This is the percentage of distance
which is account for in the splits. Higher values here are good. "
RebeccaP says, "The chapter says that r_d < 45% implies that one shouldn't use
this method to try and find a tree for the data."
RebeccaP says, "It should also be noted that this tree can find useful webs of
interactions, even when no tree exists -- the examples show this nicely."
ChristianF nods
RebeccaP says, "That is a basic survey of the method and some of its
mathematics. Hopefully this helps some in understanding the techniques. "
Bergeron says, "How do you construct the graphs?"
RebeccaP says, "Ah -- I missed a step. Sorry. Various splits correspond to
isolated sub-graphs (cutsets). "
RebeccaP says, "A split itself is an edge, connecting the sets of the split.
If you think of a set of hierarchically-related splits, you can see how a tree
is derived. Recall that there are nodes corresponding to the sets in a
split."
RebeccaP says, "The weights on the edges (or length of the edges in the
pictures) is the corresponding isolation index."
RebeccaP [to Bergeron]: Does this make sense now?
Bergeron says, "Do you do a first best split, and than you split the two new
sets?"
RebeccaP says, "Actually, instead what happens in this method is all n-choose-2
splits are checked to see if they satisfy the d-split criterion. If they do,
they are added to the mix."
Bergeron says, "What a soup!"
Bergeron says, "Where can we find the split decomposition tool?"
RebeccaP says, "Yes, but the advantage is that there is no assumption made
that a tree underlies the data, nor is there any arbitrary selection of a
root."
RebeccaP says, "I was looking at the different URL's mentioned in the text,
but I haven't had a chance to track it down. I'd start with the URLs
mentioned, check the exercises and the references (citation 15 in the
reference section) and ask Georg, in that order. :-)"
RebeccaP says, "If I get the information sooner, I'll send it out. "
RebeccaP [to All]: Does anyone else have the instructions for accessing the
tool?
ChristianF shakes his head
RebeccaP says, "Ok -- I'll see what I can find out."
RebeccaP says, "We're just about out of time. I'd say for next week we can
either continue the discussion of the last half of chapter 4 or spend more time
talking about the multiple alignment and sequence alignment tools. I'd like to
take a vote."
Bergeron says, "Next week is my last session... I would prefer chapter 4."
RebeccaP looks to the others for their input -- Eric has a vote here also...
ChristianF agrees with Bergeron
EricG [to RebeccaP]: "I prefer chapter 4."
CheeMV agrees with Ch 4
AlexS says, "ok then, chapter 4. I have to read it again then..."
ChristianF says, "Just to shortly mention a result or last weeks VSNS-BCD
meeting.."
RebeccaP says, "Ok -- seems like consensus to me. We'll talk about Ch4 and
I'll try to have the information on accessing the tools by then. I am hoping
this week to provide feedback on the exercises people have sent me. "
ChristianF says, "It was decided to have one member of each study group to
report to the vsns-bcd-students list, about"
ChristianF says, "the last meeting. "
ChristianF says, "The reports should shortly mention the or main
themes of each session"
ChristianF says, "and should include the URL for the meeting transcript."
ChristianF says, "I'm going to report to the list for our group."
ChristianF says, "BTW, at the end of the course, a meeting without"
ChristianF says, "instructors and with delegates of each group will (probably)
take place"
ChristianF says, "to give some feedback on the experiences of each group."
ChristianF says, "And to combine those experiences as a ."
ChristianF says, "Ok, just as an information."
ChristianF wonders about the low Moo lag
RebeccaP wonders also, but is NOT going to complain. :-)
ChristianF has nothing else
RebeccaP says, "I assume part of what this means is that if you have feedback
about your instructor, make sure Christian has it so that he can pass it on."
RebeccaP says, "Any other requested items for next week?"
Bergeron says, "assignments?"
RebeccaP says, "Finish reading Ch4. If I find soon enough the URL, I may have
you work with some sequences. Check out the self-assessment exercises for
Ch4. Also, bring any other questions you have on the other chapters."
AlexS nods.
Bergeron says, "Where are the exercises for chap 4?"
ChristianF [to RebeccaP]: "Oops, thanks, I forgot to ask for the support for
the feedback. Thank you.
RebeccaP says, "Certainly."
RebeccaP [to Bergeron]: From the homepage, you should be able to find what is
referred to as the self-assessment exercises. If you don't find them, let me
know quickly.
Bergeron says, "Which home page?"
ChristianF [to RebeccaP]: "I can't find the exercises either
RebeccaP [to Bergeron]: Hmm. These may not have been released yet. Let me
check with Georg. I know I have proofread some, but they may not yet be
ready for students. I'll try to put together an exercise for people.
RebeccaP says, "Nope -- they're not ready yet. I'll put one together and mail
it, hopefully tomorrow."
RebeccaP says, "Anything else?"
AlexS shakes his head.
Bergeron says, "Ok thanks, see you next Monday."
ChristianF shakes his head
RebeccaP says, "Bye Anne!"
AlexS waves to everbody...
ChristianF [to Bergeron]: "Bye
Bergeron waves goodbye to everybody
RebeccaP waves to AlexS
(Bergeron has disconnected.)
(EricG has disconnected.)
CheeMV waves bye.
AlexS finds his way out.
RebeccaP waves to all!
RebeccaP turns the RebeccaP's Recorder off.