A semi-parametric Bayesian model for unsupervised differential co-expression analysis identifies novel molecular subtype

Johannes Freudenberg, Siva Sivaganesan, Michael Wagner, Mario Medvedovic

Laboratory for Statistical Genomics and Systems Biology

Department of Environmental Health,

University of Cincinnati College of Medicine,

3223 Eden Av. ML 56, Cincinnati OH 45267-0056,

Reference

Freudenberg JM, Sivaganesan S, Wagner M, Medvedovic M: A semi-parametric Bayesian model for unsupervised differential co-expression analysis. BMC Bioinformatics 11:234. 2010.

Abstract

Background: Differential co-expression analysis is an emerging strategy for characterizing disease related dysregulation of gene expression regulatory networks. Given pre-defined sets of biological samples, such analysis aims at identifying genes that are co-expressed in one, but not in the other set of samples.
Results: We developed a novel probabilistic framework for jointly uncovering contexts (i.e. groups of samples) with specific co-expression patterns, and groups of genes with different co-expression patterns across such contexts. In contrast to current clustering and bi-clustering procedures, the implicit similarity measure in this model used for grouping biological samples is based on the clustering structure of genes within each sample and not on traditional measures of gene expression level similarities. Within this framework, biological samples with widely discordant expression patterns can be placed in the same context as long as the co-clustering structure of genes is concordant within these samples. To the best of our knowledge, this is the first method to date for unsupervised differential co-expression analysis in this generality. When applied to the problem of identifying molecular subtypes of breast cancer, our method identified reproducible patterns of differential co-expression across several independent expression datasets. Sample groupings induced by these patterns were highly informative of the disease outcome. Expression patterns of differentially co-expressed genes provided new insights into the complex nature of the ERα regulatory network.
Conclusions: We demonstrated that the use of the co-clustering structure as the similarity measure in the unsupervised analysis of sample gene expression profiles provides valuable information about expression regulatory networks.

Supplemental Materials for the paper

FTreeView display of the top 200 differentially co-expressed genes in the Schmidt et al. (GSE11121) dataset as determined by DCS and all genes for comparison - Figure 3.
DCS top 500 genes in different primary breast cancer datasets (GSE11121, GSE3494, GSE7390) - Figure 6.
DCS top 500 genes in joint breast cancer datasets - Figure 7.
Many additional results can be accessed through our Genomics Portals

Software

Contact

mario.medvedovic@uc.edu or johannes.freudenberg@uc.edu