Statistics and Data Science Seminar
Chenglong Yu
University of Illinois at Chicago
Graphical Representation of Biological Sequences and Its Applications
Abstract: Among all existing alignment-free methods for comparing biological
sequences, the sequence graphical representation provides a simple approach to
view, sort, and compare gene structures. The aim of graphical representation is
to display DNA or protein sequences graphically so that we can easily find out
visually how similar or how different they are. Of course, only the visual
comparison of sequences is not enough for the follow-up research work. We need
more accurate comparison. This leads us to develop the application of the
graphical representation for biological sequences. I will talk about two
contributions for this direction. (1) We construct a protein map with the help
of our proposed new graphical representation for protein sequences. Each
protein sequence can be represented as a point in this map, and cluster
analysis of proteins can be performed for comparison between the points. This
protein map can be used to mathematically specify the similarity of two
proteins and predict properties of an unknown protein based on its amino acid
sequence. (2) We construct a novel genome space with biological geometry, which
is a subspace in R^N. In this space each point corresponds to a genome. The
natural distance between two points in the genome space reflects the biological
distance between these two genomes. The genome space will provide a new
powerful tool for analyzing the classification of genomes and their
phylogenetic relationships.
Wednesday February 15, 2012 at 4:00 PM in SEO 636