Genomic datasets, spanning many organisms and data types, are rapidly
being produced, creating new opportunities for understanding the
molecular mechanisms underlying human disease, and for studying complex
biological processes on a global scale. Transforming these immense
amounts of data into biological information is a challenging task.
We address this challenge by presenting a statistical modeling language,
based on Bayesian networks, for representing heterogeneous biological
entities and modeling the mechanism by which they interact. We use
statistical learning approaches in order to learn the details of these
models (structure and parameters) automatically from raw genomic data.
The biological insights are then derived directly from the learned
model.

In this talk, I will describe three applications of this framework to
the study of gene regulation:
* Understanding the process by which DNA patterns (motifs) in the
control regions of genes play a role in controlling their activity.
Using only DNA sequence and gene expression data as input, these models
recovered many of the known motifs in yeast and several known motif
combinations in human.
* Finding regulatory modules and their actual regulator genes directly
from gene expression data.  Some of the predictions from this analysis
were tested successfully in the wet-lab, suggesting regulatory roles for
three previously uncharacterized proteins.
* Combining gene expression profiles from several organisms for a more
robust prediction of gene function and regulatory pathways, and for
studying the degree to which regulatory relationships have been
conserved across evolution.