Systems Seminar - CSE
Supporting Suffix trees within Database Systems
Add to Google Calendar
Providing efficient and practical solutions for computationally
intensive large-scale bioinformatics is one of the big challenges facing
the database research community. A frequently performed task in this
domain is that of sequence similarity searching. Suffix tree indexes are
among the most popular solutions for this task, due to their flexibility
in handling a wide variety of similarity metrics when used in conjunction
with dynamic programming methods.
In this talk, I will first outline the difficulties involved in supporting
suffix trees as first class indexing solutions within database kernels. I
will then present our recent research which is aimed at overcoming these
issues. In particular, I will describe our novel TOP-Q buffering strategy
and show that by the right choice of implementation the effect of the
infamous "memory bottleneck" of suffix trees can be reduced substantially.
An inherent advantage of our approach is that the large number of
bio-informatics applications that already utilize in-memory suffix trees,
can be transparently supported.