Open Access. Powered by Scholars. Published by Universities.^{®}
 Discipline

 Genetics and Genomics (21)
 Physical Sciences and Mathematics (20)
 Statistics and Probability (18)
 Genetics (16)
 Statistical Methodology (10)

 Microarrays (10)
 Statistical Theory (10)
 Statistical Models (9)
 Bioinformatics (8)
 Computational Biology (8)
 Multivariate Analysis (6)
 Applied Mathematics (6)
 Numerical Analysis and Computation (6)
 Survival Analysis (3)
 Medicine and Health Sciences (2)
 Clinical Epidemiology (1)
 Public Health (1)
 Design of Experiments and Sample Surveys (1)
 Diseases (1)
 Disease Modeling (1)
 Longitudinal Data Analysis and Time Series (1)
 Laboratory and Basic Science Research (1)
 Keyword

 Gene expression (5)
 Multiple comparisons (3)
 Multiple comparison (3)
 Counting process (2)
 Microarray (2)

 Mixture models (2)
 Linkage mapping (2)
 Bioinformatics (2)
 Compendium (2)
 False discovery rate (2)
 Multiple hypothesis testing (2)
 Familywise error rate control (2)
 Aging (1)
 Biological metadata (1)
 Affymetric Gene Chip (1)
 (Quasi)separation (1)
 Asymptotic control (1)
 BRCAPRO (1)
 Adjust p value (1)
 Bayesian variable selection (1)
 Augmentation multiple testing procedure (1)
 BRCA1 (1)
 BRCA2 (1)
 Bootstrap (1)
 Bayes Factor (1)
 Changepoint (1)
 Block (1)
 Additive hazards models (1)
 CRCAPRO (1)
 Adjusted pvalue (1)
 Publication
Articles 1  22 of 22
FullText Articles in Life Sciences
Multiple Testing Procedures: R Multtest Package And Applications To Genomics, Katherine S. Pollard, Sandrine Dudoit, Mark J. Van Der Laan
Multiple Testing Procedures: R Multtest Package And Applications To Genomics, Katherine S. Pollard, Sandrine Dudoit, Mark J. Van Der Laan
U.C. Berkeley Division of Biostatistics Working Paper Series
The Bioconductor R package multtest implements widely applicable resamplingbased singlestep and stepwise multiple testing procedures (MTP) for controlling a broad class of Type I error rates, in testing problems involving general data generating distributions (with arbitrary dependence structures among variables), null hypotheses, and test statistics. The current version of multtest provides MTPs for tests concerning means, differences in means, and regression parameters in linear and Cox proportional hazards models. Procedures are provided to control Type I error rates defined as tail probabilities for arbitrary functions of the numbers of false positives and rejected hypotheses. These error rates include tail probabilities ...
A Bayesian Method For Finding Interactions In Genomic Studies, Wei Chen, Debashis Ghosh, Trivellore E. Raghuanthan, Sharon Kardia
A Bayesian Method For Finding Interactions In Genomic Studies, Wei Chen, Debashis Ghosh, Trivellore E. Raghuanthan, Sharon Kardia
The University of Michigan Department of Biostatistics Working Paper Series
An important step in building a multiple regression model is the selection of predictors. In genomic and epidemiologic studies, datasets with a small sample size and a large number of predictors are common. In such settings, most standard methods for identifying a good subset of predictors are unstable. Furthermore, there is an increasing emphasis towards identification of interactions, which has not been studied much in the statistical literature. We propose a method, called BSI (Bayesian Selection of Interactions), for selecting predictors in a regression setting when the number of predictors is considerably larger than the sample size with a focus ...
Finding Cancer Subtypes In Microarray Data Using Random Projections, Debashis Ghosh
Finding Cancer Subtypes In Microarray Data Using Random Projections, Debashis Ghosh
The University of Michigan Department of Biostatistics Working Paper Series
One of the benefits of profiling of cancer samples using microarrays is the generation of molecular fingerprints that will define subtypes of disease. Such subgroups have typically been found in microarray data using hierarchical clustering. A major problem in interpretation of the output is determining the number of clusters. We approach the problem of determining disease subtypes using mixture models. A novel estimation procedure of the parameters in the mixture model is developed based on a combination of random projections and the expectationmaximization algorithm. Because the approach is probabilistic, our approach provides a measure for the number of true clusters ...
Effect Of Misreported Family History On Mendelian Mutation Prediction Models, Hormuzd A. Katki
Effect Of Misreported Family History On Mendelian Mutation Prediction Models, Hormuzd A. Katki
Johns Hopkins University, Dept. of Biostatistics Working Papers
People with familial history of disease often consult with genetic counselors about their chance of carrying mutations that increase disease risk. To aid them, genetic counselors use Mendelian models that predict whether the person carries deleterious mutations based on their reported family history. Such models rely on accurate reporting of each member's diagnosis and age of diagnosis, but this information may be inaccurate. Commonly encountered errors in family history can significantly distort predictions, and thus can alter the clinical management of people undergoing counseling, screening, or genetic testing. We derive general results about the distortion in the carrier probability ...
Significance Analysis Of Time Course Microarray Experiments, John D. Storey, Wenzhong Xiao, Jeffrey T. Leek, Ronald G. Tompkins, Ron W. Davis
Significance Analysis Of Time Course Microarray Experiments, John D. Storey, Wenzhong Xiao, Jeffrey T. Leek, Ronald G. Tompkins, Ron W. Davis
UW Biostatistics Working Paper Series
Characterizing the genomewide dynamic regulation of gene expression is important and will be of much interest in the future. However, there is currently no established method for identifying differentially expressed genes in a time course study. Here we propose a significance method for analyzing time course microarray studies that can be applied to the typical types of comparisons and sampling schemes. This method is applied to two studies on humans. In one study, genes are identified that show differential expression over time in response to in vivo endotoxin administration. Using our method 7409 genes are called significant at a 1 ...
Semiparametric QuantitativeTraitLocus Mapping: I. On Functional Growth Curves, Ying Qing Chen, Rongling Wu
Semiparametric QuantitativeTraitLocus Mapping: I. On Functional Growth Curves, Ying Qing Chen, Rongling Wu
U.C. Berkeley Division of Biostatistics Working Paper Series
The genetic study of certain quantitative traits in growth curves as a function of time has recently been of major scientific interest to explore the developmental evolution processes of biological subjects. Various parametric approaches in the statistical literature have been proposed to study the quantitativetraitloci (QTL) mapping of the growth curves as multivariate outcomes. In this article, we view the growth curves as functional quantitative traits and propose some semiparametric models to relax the strong parametric assumptions which may not be always practical in reality. Appropriate inference procedures are developed to estimate the parameters of interest which characterise the possible ...
Semiparametric QuantitativeTraitLocus Mapping: Ii. On Censored AgeAtOnset, Ying Qing Chen, Chengcheng Hu, Rongling Wu
Semiparametric QuantitativeTraitLocus Mapping: Ii. On Censored AgeAtOnset, Ying Qing Chen, Chengcheng Hu, Rongling Wu
U.C. Berkeley Division of Biostatistics Working Paper Series
In genetic studies, the variation in genotypes may not only affect different inheritance patterns in qualitative traits, but may also affect the ageatonset as quantitative trait. In this article, we use standard cross designs, such as backcross or F2, to propose some hazard regression models, namely, the additive hazards model in quantitative trait loci mapping for ageatonset, although the developed method can be extended to more complex designs. With additive invariance of the additive hazards models in mixture probabilities, we develop flexible semiparametric methodologies in interval regression mapping without heavy computing burden. A recently developed multiple comparison procedures is adapted ...
Quantification And Visualization Of Ld Patterns And Identification Of Haplotype Blocks, Yan Wang, Sandrine Dudoit
Quantification And Visualization Of Ld Patterns And Identification Of Haplotype Blocks, Yan Wang, Sandrine Dudoit
U.C. Berkeley Division of Biostatistics Working Paper Series
Classical measures of linkage disequilibrium (LD) between two loci, based only on the joint distribution of alleles at these loci, present noisy patterns. In this paper, we propose a new distancebased LD measure, R, which takes into account multilocus haplotypes around the two loci in order to exploit information from neighboring loci. The LD measure R yields a matrix of pairwise distances between markers, based on the correlation between the lengths of shared haplotypes among chromosomes around these markers. Data analysis demonstrates that visualization of LD patterns through the R matrix reveals more deterministic patterns, with much less noise, than ...
Accuracy Of Msi Testing In Predicting Germline Mutations Of Msh2 And Mlh1: A Case Study In Bayesian MetaAnalysis Of Diagnostic Tests Without A Gold Standard, Sining Chen, Patrice Watson, Giovanni Parmigiani
Accuracy Of Msi Testing In Predicting Germline Mutations Of Msh2 And Mlh1: A Case Study In Bayesian MetaAnalysis Of Diagnostic Tests Without A Gold Standard, Sining Chen, Patrice Watson, Giovanni Parmigiani
Johns Hopkins University, Dept. of Biostatistics Working Papers
Microsatellite instability (MSI) testing is a common screening procedure used to identify families that may harbor mutations of a mismatch repair gene and therefore may be at high risk for hereditary colorectal cancer. A reliable estimate of sensitivity and specificity of MSI for detecting germline mutations of mismatch repair genes is critical in genetic counseling and colorectal cancer prevention. Several studies published results of both MSI and mutation analysis on the same subjects. In this article we perform a metaanalysis of these studies and obtain estimates that can be directly used in counseling and screening. In particular we estimate the ...
Differential Expression With The Bioconductor Project, Anja Von Heydebreck, Wolfgang Huber, Robert Gentleman
Differential Expression With The Bioconductor Project, Anja Von Heydebreck, Wolfgang Huber, Robert Gentleman
Bioconductor Project Working Papers
A basic, yet challenging task in the analysis of microarray gene expression data is the identification of changes in gene expression that are associated with particular biological conditions. We discuss different approaches to this task and illustrate how they can be applied using software from the Bioconductor Project. A central problem is the high dimensionality of gene expression space, which prohibits a comprehensive statistical analysis without focusing on particular aspects of the joint distribution of the genes expression levels. Possible strategies are to do univariate genebygene analysis, and to perform datadriven nonspecific filtering of genes before the actual statistical analysis ...
Nonparametric Methods For Analyzing Replication Origins In Genomewide Data, Debashis Ghosh
Nonparametric Methods For Analyzing Replication Origins In Genomewide Data, Debashis Ghosh
The University of Michigan Department of Biostatistics Working Paper Series
Due to the advent of highthroughput genomic technology, it has become possible to globally monitor cellular activities on a genomewide basis. With these new methods, scientists can begin to address important biological questions. One such question involves the identification of replication origins, which are regions in chromosomes where DNA replication is initiated. In addition, one hypothesis regarding replication origins is that their locations are nonrandom throughout the genome. In this article, we develop methods for identification of and cluster inference regarding replication origins involving genomewide expression data. We compare several nonparametric regression methods for the identification of replication origin locations ...
Semiparametric Methods For Identification Of Tumor Progression Genes From Microarray Data, Debashis Ghosh, Arul Chinnaiyan
Semiparametric Methods For Identification Of Tumor Progression Genes From Microarray Data, Debashis Ghosh, Arul Chinnaiyan
The University of Michigan Department of Biostatistics Working Paper Series
The use of microarray data has become quite commonplace in medical and scientific experiments. We focus here on microarray data generated from cancer studies. It is potentially important for the discovery of biomarkers to identify genes whose expression levels correlate with tumor progression. In this article, we develop statistical procedures for the identification of such genes, which we term tumor progression genes. Two methods are considered in this paper. The first is use of a proportional odds procedure, combined with false discovery rate estimation techniques to adjust for the multiple testing problem. The second method is based on orderrestricted estimation ...
The False Discovery Rate: A Variable Selection Perspective, Debashis Ghosh, Wei Chen, Trivellore E. Raghuanthan
The False Discovery Rate: A Variable Selection Perspective, Debashis Ghosh, Wei Chen, Trivellore E. Raghuanthan
The University of Michigan Department of Biostatistics Working Paper Series
In many scientific and medical settings, largescale experiments are generating large quantities of data that lead to inferential problems involving multiple hypotheses. This has led to recent tremendous interest in statistical methods regarding the false discovery rate (FDR). Several authors have studied the properties involving FDR in a univariate mixture model setting. In this article, we turn the problem on its side; in this manuscript, we show that FDR is a byproduct of Bayesian analysis of variable selection problem for a hierarchical linear regression model. This equivalence gives many Bayesian insights as to why FDR is a natural quantity to ...
A Graph Theoretic Approach To Testing Associations Between Disparate Sources Of Functional Genomic Data, Raji Balasubramanian, Thomas Laframboise, Denise Scholtens, Robert Gentleman
A Graph Theoretic Approach To Testing Associations Between Disparate Sources Of Functional Genomic Data, Raji Balasubramanian, Thomas Laframboise, Denise Scholtens, Robert Gentleman
Bioconductor Project Working Papers
The last few years have seen the advent of highthroughput technologies to analyze various properties of the transcriptome and proteome of several organisms. The congruency of these different data sources, or lack thereof, can shed light on the mechanisms that govern cellular function. A central challenge for bioinformatics research is to develop a unified framework for combining the multiple sources of functional genomics information and testing associations between them, thus obtaining a robust and integrated view of the underlying biology.
We present a graph theoretic approach to test the significance of the association between multiple disparate sources of functional genomics ...
Statistical Analyses And Reproducible Research, Robert Gentleman, Duncan Temple Lang
Statistical Analyses And Reproducible Research, Robert Gentleman, Duncan Temple Lang
Bioconductor Project Working Papers
For various reasons, it is important, if not essential, to integrate the computations and code used in data analyses, methodological descriptions, simulations, etc. with the documents that describe and rely on them. This integration allows readers to both verify and adapt the statements in the documents. Authors can easily reproduce them in the future, and they can present the document's contents in a different medium, e.g. with interactive controls. This paper describes a software framework for authoring and distributing these integrated, dynamic documents that contain text, code, data, and any auxiliary content needed to recreate the computations. The ...
A Model Based Background Adjustment For Oligonucleotide Expression Arrays, Zhijin Wu, Rafael A. Irizarry, Robert Gentleman, Francisco Martinez Murillo, Forrest Spencer
A Model Based Background Adjustment For Oligonucleotide Expression Arrays, Zhijin Wu, Rafael A. Irizarry, Robert Gentleman, Francisco Martinez Murillo, Forrest Spencer
Johns Hopkins University, Dept. of Biostatistics Working Papers
High density oligonucleotide expression arrays are widely used in many areas of biomedical research. Affymetrix GeneChip arrays are the most popular. In the Affymetrix system, a fair amount of further preprocessing and data reduction occurs following the image processing step. Statistical procedures developed by academic groups have been successful at improving the default algorithms provided by the Affymetrix system. In this paper we present a solution to one of the preprocessing steps, background adjustment, based on a formal statistical framework. Our solution greatly improves the performance of the technology in various practical applications.
Affymetrix GeneChip arrays use short oligonucleotides to ...
Reproducible Research: A Bioinformatics Case Study, Robert Gentleman
Reproducible Research: A Bioinformatics Case Study, Robert Gentleman
Bioconductor Project Working Papers
While scientific research and the methodologies involved have gone through substantial technological evolution the technology involved in the publication of the results of these endeavors has remained relatively stagnant. Publication is largely done in the same manner today as it was fifty years ago. Many journals have adopted electronic formats, however, their orientation and style is little different from a printed document. The documents tend to be static and take little advantage of computational resources that might be available. Recent work, Gentleman and Temple Lang (2004), suggests a methodology and basic infrastructure that can be used to publish documents in ...
Classification Using Generalized Partial Least Squares, Beiying Ding, Robert Gentleman
Classification Using Generalized Partial Least Squares, Beiying Ding, Robert Gentleman
Bioconductor Project Working Papers
The advances in computational biology have made simultaneous monitoring of thousands of features possible. The high throughput technologies not only bring about a much richer information context in which to study various aspects of gene functions but they also present challenge of analyzing data with large number of covariates and few samples. As an integral part of machine learning, classification of samples into two or more categories is almost always of interest to scientists. In this paper, we address the question of classification in this setting by extending partial least squares (PLS), a popular dimension reduction tool in chemometrics, in ...
Mixture Models For Assessing Differential Expression In Complex Tissues Using Microarray Data, Debashis Ghosh
Mixture Models For Assessing Differential Expression In Complex Tissues Using Microarray Data, Debashis Ghosh
The University of Michigan Department of Biostatistics Working Paper Series
The use of DNA microarrays has become quite popular in many scientific and medical disciplines, such as in cancer research. One common goal of these studies is to determine which genes are differentially expressed between cancer and healthy tissue, or more generally, between two experimental conditions. A major complication in the molecular profiling of tumors using gene expression data is that the data represent a combination of tumor and normal cells. Much of the methodology developed for assessing differential expression with microarray data has assumed that tissue samples are homogeneous. In this article, we outline a general framework for determining ...
Optimal Sample Size For Multiple Testing: The Case Of Gene Expression Microarrays, Peter Muller, Giovanni Parmigiani, Christian Robert, Judith Rousseau
Optimal Sample Size For Multiple Testing: The Case Of Gene Expression Microarrays, Peter Muller, Giovanni Parmigiani, Christian Robert, Judith Rousseau
Johns Hopkins University, Dept. of Biostatistics Working Papers
We consider the choice of an optimal sample size for multiple comparison problems. The motivating application is the choice of the number of microarray experiments to be carried out when learning about differential gene expression. However, the approach is valid in any application that involves multiple comparisons in a large number of hypothesis tests. We discuss two decision problems in the context of this setup: the sample size selection and the decision about the multiple comparisons. We adopt a decision theoretic approach,using loss functions that combine the competing goals of discovering as many ifferentially expressed genes as possible, while ...
Calibrating Observed Differential Gene Expression For The Multiplicity Of Genes On The Array, Yingye Zheng, Margaret S. Pepe
Calibrating Observed Differential Gene Expression For The Multiplicity Of Genes On The Array, Yingye Zheng, Margaret S. Pepe
UW Biostatistics Working Paper Series
In a gene expression array study, the expression levels of thousands of genes are monitored simultaneously across various biological conditions on a small set of subjects. One goal of such studies is to explore a large pool of genes in order to select a subset of genes that appear to be differently expressed for further investigation. Of particular interest here is how to select the top k genes once genes are ranked based on their evidence for differential expression in two tissue types. We consider statistical methods that provide a more rigorous and intuitively appealing selection process for k. We ...
Bioconductor: Open Software Development For Computational Biology And Bioinformatics, Robert C. Gentleman, Vincent J. Carey, Douglas J. Bates, Benjamin M. Bolstad, Marcel Dettling, Sandrine Dudoit, Byron Ellis, Laurent Gautier, Yongchao Ge, Jeff Gentry, Kurt Hornik, Torsten Hothorn, Wolfgang Huber, Stefano Iacus, Rafael Irizarry, Friedrich Leisch, Cheng Li, Martin Maechler, Anthony J. Rossini, Guenther Sawitzki, Colin Smith, Gordon K. Smyth, Luke Tierney, Yee Hwa Yang, Jianhua Zhang
Bioconductor: Open Software Development For Computational Biology And Bioinformatics, Robert C. Gentleman, Vincent J. Carey, Douglas J. Bates, Benjamin M. Bolstad, Marcel Dettling, Sandrine Dudoit, Byron Ellis, Laurent Gautier, Yongchao Ge, Jeff Gentry, Kurt Hornik, Torsten Hothorn, Wolfgang Huber, Stefano Iacus, Rafael Irizarry, Friedrich Leisch, Cheng Li, Martin Maechler, Anthony J. Rossini, Guenther Sawitzki, Colin Smith, Gordon K. Smyth, Luke Tierney, Yee Hwa Yang, Jianhua Zhang
Bioconductor Project Working Papers
The Bioconductor project is an initiative for the collaborative creation of extensible software for computational biology and bioinformatics. We detail some of the design decisions, software paradigms and operational strategies that have allowed a small number of researchers to provide a wide variety of innovative, extensible, software solutions in a relatively short time. The use of an object oriented programming paradigm, the adoption and development of a software package system, designing by contract, distributed development and collaboration with other projects are elements of this project's success. Individually, each of these concepts are useful and important but when combined they ...