Indexing, Mining and Modeling Spatio-Temporal Patterns of Gene Expressions

 Principal Investigator:
Eric P. Xing Phone: (412)268-2559
School of Computer Science Fax : (412)268-3431
Carnegie Mellon University Email: epxing@cs.cmu.edu
Pittsburgh, PA 15213 WWW page: http://www.cs.cmu.edu/~epxing

 Co-Principal Investigator:
Christos Faloutsos Phone: (412)268-1457
School of Computer Science Fax : (412)268-5576
Carnegie Mellon University Email: christos@cs.cmu.edu
Pittsburgh, PA 15213 WWW page: http://www.cs.cmu.edu/~christos

This material is based upon work supported by the National Science Foundation under Grant No. DBI-0640543. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

1. GENERAL INFORMATION

1.1. Abstract

Link to NSF abstract

Recent progress in image-based genome-scale profiling of whole body mRNA patterns via in situ hybridization (ISH) calls for development of accurate and automatic image analysis systems to facilitate efficientmining of complex temporal-spatial mRNA patterns, which will be essential for functional genomics and regulatory network inference in higher organisms. This project tries to answer questions such as: (1) What are the differences between two embryo ISH images? How to measure the difference, quantitatively and objectively? (2) Given a collection of ISH images with gene and time stamps as well as functional attributes, how can we find the most similar ones, to a given keyword and/or image query? (3) How can we group co-expressed genes, and how to uncover underlying "themes" or biological processes? What temporal patterns can we spot? The approach consist of three efforts: (1) development of an online interface to analyze, visualize and mine the annotated images; (2) Design of scalable algorithms/models for ISH image feature extraction, co-expressed gene grouping and (3) temporal expression pattern discovery for studying gene regulatory interactions. The resulting tools will have a broad applicability. They will be available online to answer cross-modal queries and reveal spatio-temporal patterns to help genetic studies. The project Web site (http://www.db.cs.cmu.edu/db-site/Projects/cdem) will be used for demonstration and results dissemination.

1.2. Keywords

Data mining, gene expression, temporal-spatial patterns.

1.3. Funding agency

2. PEOPLE INVOLVED

In addition to the PI, the following graduate students worked on the project.

3. RESEARCH

3.1. Current Results

We developed C-DEM: an online system for Drosophila (= fruit-fly) Embryo images Mining. It is built upon more than 10k ISH images from Berkeley Drosophila Genome Project and supports queries from all three modalities to all three, namely, (a) genes, (b) images of gene expression, and (c) annotation keywords of the images. Thus, it can find images that are similar to a given image, and/or related to the desirable annotation keywords, and/or related to specific genes. C-DEM envisions the whole database as a tri-partite graph (one type for each modality), and it uses fast and flexible proximity measures, namely, random walk with restarts (RWR).

Live System
Submission to VLDB'08


Last updated: May 15, 2008, by Fan Guo