A semi-parametric Bayesian
model for unsupervised differential co-expression analysis
identifies novel molecular subtype
Johannes Freudenberg, Siva Sivaganesan, Michael Wagner, Mario Medvedovic
Laboratory for Statistical Genomics and Systems Biology
Department of Environmental Health,
University of Cincinnati College of Medicine,
3223 Eden Av. ML 56, Cincinnati OH 45267-0056,
Freudenberg JM, Sivaganesan S, Wagner M, Medvedovic M: A semi-parametric Bayesian model for unsupervised
differential co-expression analysis. BMC
Bioinformatics 11:234. 2010.
Background: Differential co-expression analysis is an emerging strategy for characterizing
disease related dysregulation of gene expression regulatory networks. Given pre-defined sets of biological samples,
such analysis aims at identifying genes that are co-expressed in one, but not in the other set of samples.
Results: We developed a novel probabilistic framework for jointly uncovering contexts (i.e. groups of samples)
with specific co-expression patterns, and groups of genes with different co-expression patterns across such contexts.
In contrast to current clustering and bi-clustering procedures, the implicit similarity measure in this model used
for grouping biological samples is based on the clustering structure of genes within each sample and not on traditional
measures of gene expression level similarities. Within this framework, biological samples with widely discordant
expression patterns can be placed in the same context as long as the co-clustering structure of genes is concordant
within these samples. To the best of our knowledge, this is the first method to date for unsupervised differential
co-expression analysis in this generality. When applied to the problem of identifying molecular subtypes of breast cancer,
our method identified reproducible patterns of differential co-expression across several independent expression datasets.
Sample groupings induced by these patterns were highly informative of the disease outcome. Expression patterns of
differentially co-expressed genes provided new insights into the complex nature of the ERα regulatory network.
Conclusions: We demonstrated that the use of the co-clustering structure as the similarity measure in the
unsupervised analysis of sample gene expression profiles provides valuable information about expression
Supplemental Materials for the
FTreeView display of the top 200 differentially co-expressed genes in the Schmidt et al. (GSE11121) dataset as determined by DCS and
all genes for comparison - Figure 3.
DCS top 500 genes in
different primary breast cancer datasets (GSE11121,
GSE7390) - Figure 6.
additional results can be accessed through our