Intensity-based hierarchical Bayes method improves testing for differentially expressed genes in microarrays

Sartor, M.A.1,2, Tomlinson, C.R.4, Wesselkamper, S.C.1, Leikauf, G.D.1,2, Medvedovic, M.1,2,3*

1Department of Environmental Health, University of Cincinnati College of Medicine, 3223 Eden Av. ML 56, Cincinnati OH 45267-0056, 2Center for Environmental Genetics 3Division of Biomedical Informatics, Cincinnati Children’s Hospital Research Foundation, 3333 Burnet Avenue, Cincinnati, OH 45229-3039, 4 Dartmouth College, Department of Medicine, Dartmouth Hitchcock Medical Center, One Medical Center Drive, Lebanon, NH  03756

* To whom correspondence should be addressed

Abstract

Background:  DNA microarrays are a powerful technology able to measure the simultaneous relative expression levels of thousands of genes. Often, small sample sizes make it difficult to accurately measure the noise level for each gene. Accurate estimates of variability are important because they can greatly improve experimental conclusions. Because the ordinary T-statistic leads to suboptimal performance when testing each gene individually, several methods have been proposed to improve results by incorporating knowledge from other genes. Many of these methods involve Bayesian non-parametric tests or Bayesian adjustments to T-statistics. We have developed a method to further improve the estimation of variability and significance statistics, resulting in clearer interpretation of microarray results.

Results:  We present a novel Bayesian moderated-T, which we show to perform favourably in simulations, with two real, dual-channel microarray experiments, and in a controlled single-channel experiment. In simulations, we show that our method outperforms a simple fold change cut-off, regular T-statistic, and Smyth’s Empirical Bayesian moderated-T. Our method has the greatest advantage when there is a strong dependency of noise level on measured fluorescence level. With real microarray datasets, we show that our method performs favourably compared with the above-mentioned methods for identifying biological categories, and that for a time course experiment, our method is the most consistent in identifying these categories. We also show our method most accurately estimates the true false positive rate, and that this rate is lowest in our method, compared with other state-of-the-art analysis methods for a publicly-available and controlled high-density single- channel experiment.

Conclusions:  We use a Bayesian hierarchical normal model to define a novel moderated T-statistic. We use the well-documented dependency of gene variance on average spot fluorescence levels to extract more information into our prior parameters in a completely data dependent way. Therefore, our method, which we refer to as Intensity-Based Moderated-T (IBMT), incorporates information from the high-dimensionality of microarrays from two sources: the variances and the expression levels of all genes. This method is most beneficial when overall gene variance is high in comparison to the conditional gene variance on spot fluorescence level.

Availability: The open-source R code for IBMT is available at http://eh3.uc.edu/ibmt.

Contact: Mario.Medvedovic@uc.edu

Supplementary Materials

The Web Supplement for this article contains... You can download the Web Supplement HERE.

The open source IBMT R function can be downloaded HERE.

 

Web Supplement

Software