4.10 Some Applications of Split Decomposition



next up previous contents
Next: Appendices Up: 4 The Mathematical Basis Previous: 4.9 Split Decomposition

4.10 Some Applications of Split Decomposition

 

Finally, here are some illustrative examples of split-decomposition analysis:

Split decomposition has been applied successfully to numerous data sets mostly from biology and psychology. For example, it has been applied to the evolution of the foot and mouth disease virus [9], genetic relationships in human populations [1], and distinguishing fish populations [2]. Here, we give three brief examples, two from biology, and one from psychology, in order to illustrate the application of SPLITSTREE . For further and more detailed examples, see also [22][21][15][10][9][4][5][2][1].

Figure 4.5:

The first example, depicted in Fig. 4.5, is the splits-graph obtained from the 23S ribosomal RNA sequences of 6 archaebacteria, 6 eubacteria (including 2 chlorplasts), and 4 eukaryotes, studied by H. Leffers et. al. [17]. Biological data sets typically give rise to slightly more splits than can be fitted into a tree. This example illustrates that a large portion of these fit together on a tree. In addition, the split-prime residue is rather small (the splittability index in this case is 87.9%). In contrast, randomly generated distance data sets tend to have a rather large residue (in practice, the splittability index of randomly generated sequence families consisting of 10 or more sequences is considerably smaller than 50%) and to produce mostly trivial splits which separate one taxon from all of the others, and almost none which separate more than two taxa from the rest (thus producing an almost bush-like structure).

Figure 4.6:

The second example is an application to a data set arising from the AIDS-virus (for more details see [10]). The splits-graph in Fig. 4.6 clearly shows the evolutionary history of the AIDS-virus. While it seemingly co-evolved with the immune system of apes and monkeys, adapting to the evolutionary pressures that it experienced there, the diagram suggests that there must have been two independent events by which humans were infected with these viruses, giving rise to the HIV-1 and HIV-2 family. This example is particularly interesting since it shows how the splits-graph can be used to identify ``explosive'' evolutionary events. Also, it should be noted that in this example the data set is again quite tree-like, which is reflected in the nature of the splits-graph.

Figure 4.7:

The final example comes from a data set obtained in cognitive pyschology by C. E. Helm [14], see also [22]. In Helm's experiment, 10 people with normal eyesight and 4 color-blind people were each asked to rank the similarity of 10 colors. The experiment went as follows: For any three colors, the test subject was first asked to decide which two were least similar. She then had to estimate the distance of the third color to the other two, using colored counters on a board. From this set of data (120 triplets per test subject), Helm computed a distance measure on the set of 10 colors. Fig. 4.7 shows the splits graph corresponding to the distance measure obtained from the 10 persons with normal eyesight, whereas the distance measure produced by one of the 4 color-blind persons is depicted in Fig. 4.8.

Figure 4.8:

Note that the splittability index in Fig. 4.7 is 97%, hence the graph very closely represents the given distance measure. We see that the split that has the largest isolation index separates the two ``warmest'' colors yellow and red from the others. Moreover, the colors purple, purple-reddish and red-purple all lie close to each other, as do green, (green-yellow) greenish and (green-yellow) yellowish. Moreover, the graph clearly approximates the well-known color-circle, and the distances between pairs of diametrically opposite colors are very similar to each other.

In contrast, we find that the splittability index is only 60.6% in Fig. 4.8. This low value indicates that the data set is either quite noisy or, more probably, that the distance measure does not fit too well into the framework defined by split decomposition theory. Indeed, Helm stated that these distances can only be visualized in a higher dimensional space.

Finally, this example illustrates the fact that the splits-graph can be an effective data analysis tool even when the graph is rather grid-like (i.e. has no definite tree-like structure).



next up previous contents
Next: Appendices Up: 4 The Mathematical Basis Previous: 4.9 Split Decomposition




Sat Aug 12 18:25:18 MET DST 1995