Declarative Querying for Biomedical Applications
Add to Google Calendar
Modern biomedical explorations often need to analyze and manage large volumes of complex biological data. Unfortunately, existing data management methods used in such research employ awkward procedural querying methods, and often use query evaluation algorithms that don't scale as the data size increases. For example, data is often stored in flat files, and queries are expressed and evaluated by writing programs in languages such as JAVA, Perl, or Python. The perils of employing such procedural querying methods are well known to a database audience, namely a) severely limiting the ability to rapidly express complex queries, and b) often resulting in very inefficient query plans as sophisticated query optimization and evaluation methods are not employed. The problem is likely to get worse in the future as many biomedical datasets are growing at a rate faster than Moore's Law, and the queries that scientists want to pose are also increasing in their complexity. The focus of my research is on building systems that allow efficient and declarative querying for biomedical applications. Building such systems requires developing high-level querying methods that allows a scientist to rapidly compose queries, and also requires designing scalable methods for evaluating the queries. In this talk, I will describe the ongoing work in the Periscope project, in which we are building a declarative and efficient query processing system for managing genomic and proteomics datasets.