Open Access. Powered by Scholars. Published by Universities.®

Life Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability

Computational Biology/Bioinformatics

Articles 1 - 12 of 12

Full-Text Articles in Life Sciences

James-Stein Estimation And The Benjamini-Hochberg Procedure, Debashis Ghosh Jan 2012

James-Stein Estimation And The Benjamini-Hochberg Procedure, Debashis Ghosh

Debashis Ghosh

For the problem of multiple testing, the Benjamini-Hochberg (B-H) procedure has become a very popular method in applications. Based on a spacings theory representation of the B-H procedure, we are able to motivate the use of shrinkage estimators for modifying the B-H procedure. Several generalizations in the paper are discussed, and the methodology is applied to real and simulated datasets.


Shrinkage In Adaptive Procedures For False Discovery Rate Estimation In Multiple Testing: Structure And Synthesis, Debashis Ghosh Jan 2012

Shrinkage In Adaptive Procedures For False Discovery Rate Estimation In Multiple Testing: Structure And Synthesis, Debashis Ghosh

Debashis Ghosh

There has been much interest in the study of adaptive estimation procedures for controlling the false discovery rate (FDR). In this article, we take the direct approach to estimation of FDR of Storey (2002) and show how it can reexpressed as a particular type of shrinkage estimator. This representation leads to natural conditions on finite-sample FDR control for a general class of shrinkage estimators. In addition, many previous proposals from the literature can be unified under this framework for which finite-sample FDR results can be developed. Some asymptotic results are also provided.


Generalized Benjamini-Hochberg Procedures Using Spacings, Debashis Ghosh Jan 2011

Generalized Benjamini-Hochberg Procedures Using Spacings, Debashis Ghosh

Debashis Ghosh

For the problem of multiple testing, the Benjamini-Hochberg (B-H) procedure has become a very popular method in applications. We show how the B-H procedure can be interpreted as a test based on the spacings corresponding to the p-value distributions. Using this equivalence, we develop a class of generalized B-H procedures that maintain control of the false discovery rate in finite-samples. We also consider the effect of correlation on the procedure; simulation studies are used to illustrate the methodology.


Software For Assumption Weighting For Meta-Analysis Of Genomic Data, Debashis Ghosh, Yihan Li Jan 2011

Software For Assumption Weighting For Meta-Analysis Of Genomic Data, Debashis Ghosh, Yihan Li

Debashis Ghosh

This is the software that accompanies Li and Ghosh, "Assumption weighting for incorporating heterogeneity into meta-analysis of genomic data."


Clustering With Exclusion Zones: Genomic Applications, Mark Segal, Yuanyuan Xiao, Fred Huffer Dec 2010

Clustering With Exclusion Zones: Genomic Applications, Mark Segal, Yuanyuan Xiao, Fred Huffer

Mark R Segal

Methods for formally evaluating the clustering of events in space or time, notably the scan statistic, have been richly developed and widely applied. In order to utilize the scan statistic and related approaches, it is necessary to know the extent of the spatial or temporal domains wherein the events arise. Implicit in their usage is that these domains have no “holes”—hereafter “exclusion zones”—regions in which events a priori cannot occur. However, in many contexts, this requirement is not met. When the exclusion zones are known, it is straightforward to correct the scan statistic for their occurrence by simply ...


Discrete Nonparametric Algorithms For Outlier Detection With Genomic Data, Debashis Ghosh Jan 2010

Discrete Nonparametric Algorithms For Outlier Detection With Genomic Data, Debashis Ghosh

Debashis Ghosh

In high-throughput studies involving genetic data such as from gene expression mi- croarrays, dierential expression analysis between two or more experimental conditions has been a very common analytical task. Much of the resulting literature on multiple comparisons has paid relatively little attention to the choice of test statistic. In this article, we focus on the issue of choice of test statistic based on a special pattern of dierential expression. The approach here is based on recasting multiple comparisons procedures for assessing outlying expression values. A major complication is that the resulting p-values are discrete; some theoretical properties of sequential testing ...


Detecting Outlier Genes From High-Dimensional Data: A Fuzzy Approach, Debashis Ghosh Jan 2010

Detecting Outlier Genes From High-Dimensional Data: A Fuzzy Approach, Debashis Ghosh

Debashis Ghosh

A recent nding in cancer research has been the characterization of previously undis- covered chromosomal abnormalities in several types of solid tumors. This was found based on analyses of high-throughput data from gene expression microarrays and motivated the development of so-called `outlier' tests for dierential expression. One statistical issue was the potential discreteness of the test statistics. Using ideas from fuzzy set theory, we develop fuzzy outlier detection algorithms that have links to ideas in multiple comparisons. Two- and K-sample extensions are considered. The methodology is illustrated by application to two microarray studies.


Uniqueprimer - A Web Utility For Design Of Specific Pcr Primers And Probes, Torstein Tengs Jan 2009

Uniqueprimer - A Web Utility For Design Of Specific Pcr Primers And Probes, Torstein Tengs

Dr. Torstein Tengs

We have developed a web-based tool for design of specific PCR primers and probes. The program allows you to enter primer sequence information as well as an optional probe, and sequence similarity searches (MegaBLAST) will be performed to see if the sequences match the same sequence entry in the specified database. If primers (and probe) match, this will be reported. The program can handle overlapping amplicons, amplification from a single primer, ambiguous bases and other problematic cases.


Hierarchical Hidden Markov Model With Application To Joint Analysis Of Chip-Chip And Chip-Seq Data, Hyungwon Choi, Debashis Ghosh, Zhaohui S. Qin Jan 2009

Hierarchical Hidden Markov Model With Application To Joint Analysis Of Chip-Chip And Chip-Seq Data, Hyungwon Choi, Debashis Ghosh, Zhaohui S. Qin

Debashis Ghosh

Motivation: Identication of transcription factor binding sites (TFBS) is a fundamental problem in understanding the mechanism of gene regulation. The ChIP-chip technology has accelerated this eort by providing a simultaneous genome-wide map of TFBS in a high-throughput fashion. Recently, a sequencing-based ChIP-seq has appeared as a promising alternative that can identify targets with an improved sensitivity/specicity in high resolution. However, studies have suggested that distinct experimental platforms can be complementary in TFBS identication. The availability of data obtained from multiple platforms motivates a meta-analysis for improved identication of candidate motifs.

Results: In this work, we propose a hierarchical hidden ...


A Double-Layered Mixture Model For The Joint Analysis Of Dna Copy Number And Gene Expression Data, Debashis Ghosh Jan 2009

A Double-Layered Mixture Model For The Joint Analysis Of Dna Copy Number And Gene Expression Data, Debashis Ghosh

Debashis Ghosh

Copy number aberration is a common form of genomic instability in cancer. Gene expression is closely tied to cytogenetic events by the central dogma of molecular biology, and serves as a mediator of copy number changes in disease phenotypes. Accordingly, it is of interest to develop proper statistical methods for jointly analyzing copy number and gene expression data. This work describes a novel Bayesian inferential approach for a double-layered mixture model (DLMM) which directly models the stochastic nature of copy number data and identifies abnormally expressed genes due to aberrant copy number. Simulation studies were conducted to illustrate the robustness ...


Discrete Nonparametric Algorithms For Outlier Detection With Genomic Data, Debashis Ghosh Jan 2009

Discrete Nonparametric Algorithms For Outlier Detection With Genomic Data, Debashis Ghosh

Debashis Ghosh

In high-throughput studies involving genetic data such as from gene expression microarrays, differential expression analysis between two or more experimental conditions has been a very common analytical task. Much of the resulting literature on multiple comparisons has paid relatively little attention to the choice of test statistic. In this article, we focus on the issue of choice of test statistic based on a special pattern of differential expression. The approach here is based on recasting multiple comparisons procedures for assessing outlying expression values. A major complication is that the resulting p-values are discrete; some theoretical properties of sequential testing procedures ...


Identification Of Yeast Transcriptional Regulation Networks Using Multivariate Random Forests, Yuanyuan Xiao, Mark Segal Dec 2008

Identification Of Yeast Transcriptional Regulation Networks Using Multivariate Random Forests, Yuanyuan Xiao, Mark Segal

Mark R Segal

The recent availability of whole-genome scale data sets that investigate complementary and diverse aspects of transcriptional regulation has spawned an increased need for new and effective computational approaches to analyze and integrate these large scale assays. Here, we propose a novel algorithm, based on random forest methodology, to relate gene expression (as derived from expression microarrays) to sequence features residing in gene promoters (as derived from DNA motif data) and transcription factor binding to gene promoters (as derived from tiling microarrays). We extend the random forest approach to model a multivariate response as represented, for example, by time-course gene expression ...