Open Access. Powered by Scholars. Published by Universities.®

Life Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

SelectedWorks

Statistics and Probability

Genomics

Articles 1 - 15 of 15

Full-Text Articles in Life Sciences

Functional Car Models For Spatially Correlated Functional Datasets, Lin Zhang, Veerabhadran Baladandayuthapani, Hongxiao Zhu, Keith A. Baggerly, Tadeusz Majewski, Bogdan Czerniak, Jeffrey S. Morris Jan 2016

Functional Car Models For Spatially Correlated Functional Datasets, Lin Zhang, Veerabhadran Baladandayuthapani, Hongxiao Zhu, Keith A. Baggerly, Tadeusz Majewski, Bogdan Czerniak, Jeffrey S. Morris

Jeffrey S. Morris

We develop a functional conditional autoregressive (CAR) model for spatially correlated data for which functions are collected on areal units of a lattice. Our model performs functional response regression while accounting for spatial correlations with potentially nonseparable and nonstationary covariance structure, in both the space and functional domains. We show theoretically that our construction leads to a CAR model at each functional location, with spatial covariance parameters varying and borrowing strength across the functional domain. Using basis transformation strategies, the nonseparable spatial-functional model is computationally scalable to enormous functional datasets, generalizable to different basis functions, and can be used on ...


Ordinal Probit Wavelet-Based Functional Models For Eqtl Analysis, Mark J. Meyer, Jeffrey S. Morris, Craig P. Hersh, Jarret D. Morrow, Christoph Lange, Brent A. Coull Jan 2015

Ordinal Probit Wavelet-Based Functional Models For Eqtl Analysis, Mark J. Meyer, Jeffrey S. Morris, Craig P. Hersh, Jarret D. Morrow, Christoph Lange, Brent A. Coull

Jeffrey S. Morris

Current methods for conducting expression Quantitative Trait Loci (eQTL) analysis are limited in scope to a pairwise association testing between a single nucleotide polymorphism (SNPs) and expression probe set in a region around a gene of interest, thus ignoring the inherent between-SNP correlation. To determine association, p-values are then typically adjusted using Plug-in False Discovery Rate. As many SNPs are interrogated in the region and multiple probe-sets taken, the current approach requires the fitting of a large number of models. We propose to remedy this by introducing a flexible function-on-scalar regression that models the genome as a functional outcome. The ...


Bayesian Joint Selection Of Genes And Pathways: Applications In Multiple Myeloma Genomics, Lin Zhang, Jeffrey S. Morris, Jiexin Zhang, Robert Orlowski, Veerabhadran Baladandayuthapani Jan 2014

Bayesian Joint Selection Of Genes And Pathways: Applications In Multiple Myeloma Genomics, Lin Zhang, Jeffrey S. Morris, Jiexin Zhang, Robert Orlowski, Veerabhadran Baladandayuthapani

Jeffrey S. Morris

It is well-established that the development of a disease, especially cancer, is a complex process that results from the joint effects of multiple genes involved in various molecular signaling pathways. In this article, we propose methods to discover genes and molecular pathways significantly associ- ated with clinical outcomes in cancer samples. We exploit the natural hierarchal structure of genes related to a given pathway as a group of interacting genes to conduct selection of both pathways and genes. We posit the problem in a hierarchical structured variable selection (HSVS) framework to analyze the corresponding gene expression data. HSVS methods conduct ...


Integrative Bayesian Analysis Of High-Dimensional Multi-Platform Genomics Data, Wenting Wang, Veerabhadran Baladandayuthapani, Jeffrey S. Morris, Bradley M. Broom, Ganiraju C. Manyam, Kim-Anh Do Jan 2012

Integrative Bayesian Analysis Of High-Dimensional Multi-Platform Genomics Data, Wenting Wang, Veerabhadran Baladandayuthapani, Jeffrey S. Morris, Bradley M. Broom, Ganiraju C. Manyam, Kim-Anh Do

Jeffrey S. Morris

Motivation: Analyzing data from multi-platform genomics experiments combined with patients’ clinical outcomes helps us understand the complex biological processes that characterize a disease, as well as how these processes relate to the development of the disease. Current integration approaches that treat the data are limited in that they do not consider the fundamental biological relationships that exist among the data from platforms.

Statistical Model: We propose an integrative Bayesian analysis of genomics data (iBAG) framework for identifying important genes/biomarkers that are associated with clinical outcome. This framework uses a hierarchical modeling technique to combine the data obtained from multiple ...


Members’ Discoveries: Fatal Flaws In Cancer Research, Jeffrey S. Morris Jan 2010

Members’ Discoveries: Fatal Flaws In Cancer Research, Jeffrey S. Morris

Jeffrey S. Morris

A recent article published in The Annals of Applied Statistics (AOAS) by two MD Anderson researchers—Keith Baggerly and Kevin Coombes—dissects results from a highly-influential series of medical papers involving genomics-driven personalized cancer therapy, and outlines a series of simple yet fatal flaws that raises serious questions about the veracity of the original results. Having immediate and strong impact, this paper, along with related work, is providing the impetus for new standards of reproducibility in scientific research.


Statistical Contributions To Proteomic Research, Jeffrey S. Morris, Keith A. Baggerly, Howard B. Gutstein, Kevin R. Coombes Jan 2010

Statistical Contributions To Proteomic Research, Jeffrey S. Morris, Keith A. Baggerly, Howard B. Gutstein, Kevin R. Coombes

Jeffrey S. Morris

Proteomic profiling has the potential to impact the diagnosis, prognosis, and treatment of various diseases. A number of different proteomic technologies are available that allow us to look at many proteins at once, and all of them yield complex data that raise significant quantitative challenges. Inadequate attention to these quantitative issues can prevent these studies from achieving their desired goals, and can even lead to invalid results. In this chapter, we describe various ways the involvement of statisticians or other quantitative scientists in the study team can contribute to the success of proteomic research, and we outline some of the ...


Bayesian Random Segmentationmodels To Identify Shared Copy Number Aberrations For Array Cgh Data, Veerabhadran Baladandayuthapani, Yuan Ji, Rajesh Talluri, Luis E. Nieto-Barajas, Jeffrey S. Morris Jan 2010

Bayesian Random Segmentationmodels To Identify Shared Copy Number Aberrations For Array Cgh Data, Veerabhadran Baladandayuthapani, Yuan Ji, Rajesh Talluri, Luis E. Nieto-Barajas, Jeffrey S. Morris

Jeffrey S. Morris

Array-based comparative genomic hybridization (aCGH) is a high-resolution high-throughput technique for studying the genetic basis of cancer. The resulting data consists of log fluorescence ratios as a function of the genomic DNA location and provides a cytogenetic representation of the relative DNA copy number variation. Analysis of such data typically involves estimation of the underlying copy number state at each location and segmenting regions of DNA with similar copy number states. Most current methods proceed by modeling a single sample/array at a time, and thus fail to borrow strength across multiple samples to infer shared regions of copy number ...


Detecting Outlier Genes From High-Dimensional Data: A Fuzzy Approach, Debashis Ghosh Jan 2010

Detecting Outlier Genes From High-Dimensional Data: A Fuzzy Approach, Debashis Ghosh

Debashis Ghosh

A recent nding in cancer research has been the characterization of previously undis- covered chromosomal abnormalities in several types of solid tumors. This was found based on analyses of high-throughput data from gene expression microarrays and motivated the development of so-called `outlier' tests for dierential expression. One statistical issue was the potential discreteness of the test statistics. Using ideas from fuzzy set theory, we develop fuzzy outlier detection algorithms that have links to ideas in multiple comparisons. Two- and K-sample extensions are considered. The methodology is illustrated by application to two microarray studies.


Alternative Probeset Definitions For Combining Microarray Data Across Studies Using Different Versions Of Affymetrix Oligonucleotide Arrays, Jeffrey S. Morris, Chunlei Wu, Kevin R. Coombes, Keith A. Baggerly, Jing Wang, Li Zhang Dec 2006

Alternative Probeset Definitions For Combining Microarray Data Across Studies Using Different Versions Of Affymetrix Oligonucleotide Arrays, Jeffrey S. Morris, Chunlei Wu, Kevin R. Coombes, Keith A. Baggerly, Jing Wang, Li Zhang

Jeffrey S. Morris

Many published microarray studies have small to moderate sample sizes, and thus have low statistical power to detect significant relationships between gene expression levels and outcomes of interest. By pooling data across multiple studies, however, we can gain power, enabling us to detect new relationships. This type of pooling is complicated by the fact that gene expression measurements from different microarray platforms are not directly comparable. In this chapter, we discuss two methods for combining information across different versions of Affymetrix oligonucleotide arrays. Each involves a new approach for combining probes on the array into probesets. The first approach involves ...


Some Statistical Issues In Microarray Gene Expression Data, Matthew S. Mayo, Byron J. Gajewski, Jeffrey S. Morris Jun 2006

Some Statistical Issues In Microarray Gene Expression Data, Matthew S. Mayo, Byron J. Gajewski, Jeffrey S. Morris

Jeffrey S. Morris

In this paper we discuss some of the statistical issues that should be considered when conducting experiments involving microarray gene expression data. We discuss statistical issues related to preprocessing the data as well as the analysis of the data. Analysis of the data is discussed in three contexts: class comparison, class prediction and class discovery. We also review the methods used in two studies that are using microarray gene expression to assess the effect of exposure to radiofrequency (RF) fields on gene expression. Our intent is to provide a guide for radiation researchers when conducting studies involving microarray gene expression ...


Shrinkage Estimation For Sage Data Using A Mixture Dirichlet Prior, Jeffrey S. Morris, Keith A. Baggerly, Kevin R. Coombes Mar 2006

Shrinkage Estimation For Sage Data Using A Mixture Dirichlet Prior, Jeffrey S. Morris, Keith A. Baggerly, Kevin R. Coombes

Jeffrey S. Morris

Serial Analysis of Gene Expression (SAGE) is a technique for estimating the gene expression profile of a biological sample. Any efficient inference in SAGE must be based upon efficient estimates of these gene expression profiles, which consist of the estimated relative abundances for each mRNA species present in the sample. The data from SAGE experiments are counts for each observed mRNA species, and can be modeled using a multinomial distribution with two characteristics: skewness in the distribution of relative abundances and small sample size relative to the dimension. As a result of these characteristics, a given SAGE sample will fail ...


An Introduction To High-Throughput Bioinformatics Data, Keith A. Baggerly, Kevin R. Coombes, Jeffrey S. Morris Mar 2006

An Introduction To High-Throughput Bioinformatics Data, Keith A. Baggerly, Kevin R. Coombes, Jeffrey S. Morris

Jeffrey S. Morris

High throughput biological assays supply thousands of measurements per sample, and the sheer amount of related data increases the need for better models to enhance inference. Such models, however, are more effective if they take into account the idiosyncracies associated with the specific methods of measurement: where the numbers come from. We illustrate this point by describing three different measurement platforms: microarrays, serial analysis of gene expression (SAGE), and proteomic mass spectrometry.


Bayesian Mixture Models For Gene Expression And Protein Profiles, Michele Guindani, Kim-Anh Do, Peter Mueller, Jeffrey S. Morris Mar 2006

Bayesian Mixture Models For Gene Expression And Protein Profiles, Michele Guindani, Kim-Anh Do, Peter Mueller, Jeffrey S. Morris

Jeffrey S. Morris

We review the use of semi-parametric mixture models for Bayesian inference in high throughput genomic data. We discuss three specific approaches for microarray data, for protein mass spectrometry experiments, and for SAGE data. For the microarray data and the protein mass spectrometry we assume group comparison experiments, i.e., experiments that seek to identify genes and proteins that are differentially expressed across two biologic conditions of interest. For the SAGE data example we consider inference for a single biologic sample.


Pooling Information Across Different Studies And Oligonucleotide Microarray Chip Types To Identify Prognostic Genes For Lung Cancer., Jeffrey S. Morris, Guosheng Yin, Keith A. Baggerly, Chunlei Wu, Li Zhang Dec 2005

Pooling Information Across Different Studies And Oligonucleotide Microarray Chip Types To Identify Prognostic Genes For Lung Cancer., Jeffrey S. Morris, Guosheng Yin, Keith A. Baggerly, Chunlei Wu, Li Zhang

Jeffrey S. Morris

Our goal in this work is to pool information across microarray studies conducted at different institutions using two different versions of Affymetrix chips to identify genes whose expression levels offer information on lung cancer patients’ survival above and beyond the information provided by readily available clinical covariates. We combine information across chip types by identifying “matching probes” present on both chips, and then assembling them into new probesets based on Unigene clusters. This method yields comparable expression level quantifications across chips without sacrificing much precision or significantly altering the relative ordering of the samples. We fit a series of multivariable ...


Bayesian Shrinkage Estimation Of The Relative Abundance Of Mrna Transcripts Using Sage, Jeffrey S. Morris, Keith A. Baggerly, Kevin R. Coombes Mar 2003

Bayesian Shrinkage Estimation Of The Relative Abundance Of Mrna Transcripts Using Sage, Jeffrey S. Morris, Keith A. Baggerly, Kevin R. Coombes

Jeffrey S. Morris

Serial analysis of gene expression (SAGE) is a technology for quantifying gene expression in biological tissue that yields count data that can be modeled by a multinomial distribution with two characteristics: skewness in the relative frequencies and small sample size relative to the dimension. As a result of these characteristics, a given SAGE sample may fail to capture a large number of expressed mRNA species present in the tissue. Empirical estimators of mRNA species’ relative abundance effectively ignore these missing species, and as a result tend to overestimate the abundance of the scarce observed species comprising a vast majority of ...