Open Access. Powered by Scholars. Published by Universities.®

Life Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 13 of 13

Full-Text Articles in Life Sciences

Estimating Autoantibody Signatures To Detect Autoimmune Disease Patient Subsets, Zhenke Wu, Livia Casciola-Rosen, Ami A. Shah, Antony Rosen, Scott L. Zeger Apr 2017

Estimating Autoantibody Signatures To Detect Autoimmune Disease Patient Subsets, Zhenke Wu, Livia Casciola-Rosen, Ami A. Shah, Antony Rosen, Scott L. Zeger

Johns Hopkins University, Dept. of Biostatistics Working Papers

Autoimmune diseases are characterized by highly specific immune responses against molecules in self-tissues. Different autoimmune diseases are characterized by distinct immune responses, making autoantibodies useful for diagnosis and prediction. In many diseases, the targets of autoantibodies are incompletely defined. Although the technologies for autoantibody discovery have advanced dramatically over the past decade, each of these techniques generates hundreds of possibilities, which are onerous and expensive to validate. We set out to establish a method to greatly simplify autoantibody discovery, using a pre-filtering step to define subgroups with similar specificities based on migration of labeled, immunoprecipitated proteins on sodium dodecyl sulfate ...


Statistical Methods For Estimation, Testing, And Clustering With Gene Expression Data, Andrew Lithio Jan 2017

Statistical Methods For Estimation, Testing, And Clustering With Gene Expression Data, Andrew Lithio

Graduate Theses and Dissertations

This thesis is comprised of a collection of papers on the analysis of gene expression data, namely high-throughput RNA-sequencing (RNA-seq) data, with some methods generalizable to other scientific data. We first introduce a method for identifying differentially expressed genes using an empirical-Bayes-type analysis of RNA-seq data that employs efficient computational algorithms. A generalizable method for reparameterization is discussed, and simulation is used to demonstrate its importance in test performance. Next, exact tests for a monotone mean expression pattern are developed and incorporated into an existing pipeline for analysis of RNA-seq data. The advantages of computing exact $p$-values and of ...


A Framework For The Statistical Analysis Of Mass Spectrometry Imaging Experiments, Kyle Bemis Dec 2016

A Framework For The Statistical Analysis Of Mass Spectrometry Imaging Experiments, Kyle Bemis

Open Access Dissertations

Mass spectrometry (MS) imaging is a powerful investigation technique for a wide range of biological applications such as molecular histology of tissue, whole body sections, and bacterial films , and biomedical applications such as cancer diagnosis. MS imaging visualizes the spatial distribution of molecular ions in a sample by repeatedly collecting mass spectra across its surface, resulting in complex, high-dimensional imaging datasets. Two of the primary goals of statistical analysis of MS imaging experiments are classification (for supervised experiments), i.e. assigning pixels to pre-defined classes based on their spectral profiles, and segmentation (for unsupervised experiments), i.e. assigning pixels to ...


Enscat: Clustering Of Categorical Data Via Ensembling, Bertrand S. Clarke, Saeid Amiri, Jennifer L. Clarke Jan 2016

Enscat: Clustering Of Categorical Data Via Ensembling, Bertrand S. Clarke, Saeid Amiri, Jennifer L. Clarke

Faculty Publications, Department of Statistics

Background: Clustering is a widely used collection of unsupervised learning techniques for identifying natural classes within a data set. It is often used in bioinformatics to infer population substructure. Genomic data are often categorical and high dimensional, e.g., long sequences of nucleotides. This makes inference challenging: The distance metric is often not well-defined on categorical data; running time for computations using high dimensional data can be considerable; and the Curse of Dimensionality often impedes the interpretation of the results. Up to the present, however, the literature and software addressing clustering for categorical data has not yet led to a ...


Methods For Identifying Regions Of Brain Activation Using Fmri Meta-Data, Meredith A. Ray Dec 2014

Methods For Identifying Regions Of Brain Activation Using Fmri Meta-Data, Meredith A. Ray

Theses and Dissertations

Functional neuroimaging is a relatively young discipline within the neurosciences that has led to significant advances in our understanding of the human brain and progress in neuroscientific research related to public health. Accurately identifying activated regions in the brain showing a strong association with an outcome of interest is crucial in terms of disease prediction and prevention. Functional magnetic resonance imaging (fMRI) is the most widely used method for this type of study as it has the ability to measure and identify the location of changes in tissue perfusion, blood oxygenation, and blood volume. In practice, the three-dimensional brain locations ...


The Nuances Of Statistically Analyzing Next-Generation Sequencing Data, Sanvesh Srivastava, R. W. Doerge Apr 2012

The Nuances Of Statistically Analyzing Next-Generation Sequencing Data, Sanvesh Srivastava, R. W. Doerge

Conference on Applied Statistics in Agriculture

High-throughput sequencing technologies, in particular next-generation sequencing (NGS) technologies, have emerged as the preferred approach for exploring both gene function and pathway organization. Data from NGS technologies pose new computational and statistical challenges because of their massive size, limited replicate information, large number of genes (high-dimensionality), and discrete form. They are more complex than data from previous high-throughput technologies such as microarrays. In this work we focus on the statistical issues in analyzing and modeling NGS data for selecting genes suitable for further exploration and present a brief review of the relevant statistical methods. We discuss visualization methods to assess ...


Generalized Benjamini-Hochberg Procedures Using Spacings, Debashis Ghosh Jan 2011

Generalized Benjamini-Hochberg Procedures Using Spacings, Debashis Ghosh

Debashis Ghosh

For the problem of multiple testing, the Benjamini-Hochberg (B-H) procedure has become a very popular method in applications. We show how the B-H procedure can be interpreted as a test based on the spacings corresponding to the p-value distributions. Using this equivalence, we develop a class of generalized B-H procedures that maintain control of the false discovery rate in finite-samples. We also consider the effect of correlation on the procedure; simulation studies are used to illustrate the methodology.


Class Discovery And Prediction Of Tumor With Microarray Data, Bo Liu Jan 2011

Class Discovery And Prediction Of Tumor With Microarray Data, Bo Liu

All Theses, Dissertations, and Other Capstone Projects

Current microarray technology is able take a single tissue sample to construct an Affymetrix oglionucleotide array containing (estimated) expression levels of thousands of different genes for that tissue. The objective is to develop a more systematic approach to cancer classification based on Affymetrix oglionucleotide microarrays. For this purpose, I studied published colon cancer microarray data. Colon cancer, with 655,000 deaths worldwide per year, has become the fourth most common form of cancer in the United States and the third leading cause of cancer - related death in the Western world. This research has been focuses in two areas: class discovery ...


Informatics And Statistics For Analyzing 2-D Gel Electrophoresis Images, Andrew W. Dowsey, Jeffrey S. Morris, Howard G. Gutstein, Guang Z. Yang Jan 2010

Informatics And Statistics For Analyzing 2-D Gel Electrophoresis Images, Andrew W. Dowsey, Jeffrey S. Morris, Howard G. Gutstein, Guang Z. Yang

Jeffrey S. Morris

Whilst recent progress in ‘shotgun’ peptide separation by integrated liquid chromatography and mass spectrometry (LC/MS) has enabled its use as a sensitive analytical technique, proteome coverage and reproducibility is still limited and obtaining enough replicate runs for biomarker discovery is a challenge. For these reasons, recent research demonstrates the continuing need for protein separation by two-dimensional gel electrophoresis (2-DE). However, with traditional 2-DE informatics, the digitized images are reduced to symbolic data though spot detection and quantification before proteins are compared for differential expression by spot matching. Recently, a more robust and automated paradigm has emerged where gels are ...


Dynamic Clustering Of Cell-Cycle Microarray Data, Lingling An, R. W. Doerge Apr 2008

Dynamic Clustering Of Cell-Cycle Microarray Data, Lingling An, R. W. Doerge

Conference on Applied Statistics in Agriculture

The cell cycle is a crucial series of events that are repeated over time, allowing the cell to grow, duplicate, and split. Cell-cycle systems play an important role in cancer and other biological processes. Using gene expression data gained from microarray technology it is possible to group or cluster genes that are involved in the cell-cycle for the purpose of exploring their functional co-regulation. Typically, the goal of clustering methods as applied to gene expression data is to place genes with similar expression patterns or profiles into the same group or cluster for the purpose of inferring the function of ...


Clustering A Series Of Replicated Polyploid Gene Expression Experiments In Maize, Lingling An, Nicole C. Riddle, James A. Birchler, R. W. Doerge Apr 2006

Clustering A Series Of Replicated Polyploid Gene Expression Experiments In Maize, Lingling An, Nicole C. Riddle, James A. Birchler, R. W. Doerge

Conference on Applied Statistics in Agriculture

Ploidy level is defined as the number of individual sets of chromosomes contained in a single cell. Many important crop plants, such as potato, soybean and wheat are polyploid. Although it is widely known that polyploidy is a frequent evolutionary event, it is not fully understand why polyploids have been so successful. In this work cluster analysis is employed to study gene expression changes in a maize inbred line (B73) across a range of polyploidy levels. The B73 ploidy series includes monoploid, diploid, triploid and tetraploid plants and consists of biological and technical replicates as measured by microarray technology. An ...


Cluster Analysis Of Genomic Data With Applications In R, Katherine S. Pollard, Mark J. Van Der Laan Jan 2005

Cluster Analysis Of Genomic Data With Applications In R, Katherine S. Pollard, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

In this paper, we provide an overview of existing partitioning and hierarchical clustering algorithms in R. We discuss statistical issues and methods in choosing the number of clusters, the choice of clustering algorithm, and the choice of dissimilarity matrix. In particular, we illustrate how the bootstrap can be employed as a statistical method in cluster analysis to establish the reproducibility of the clusters and the overall variability of the followed procedure. We also show how to visualize a clustering result by plotting ordered dissimilarity matrices in R. We present a new R package, hopach, which implements the hybrid clustering method ...


Statistical Inference For Simultaneous Clustering Of Gene Expression Data, Katherine S. Pollard, Mark J. Van Der Laan Jul 2001

Statistical Inference For Simultaneous Clustering Of Gene Expression Data, Katherine S. Pollard, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Current methods for analysis of gene expression data are mostly based on clustering and classification of either genes or samples. We offer support for the idea that more complex patterns can be identified in the data if genes and samples are considered simultaneously. We formalize the approach and propose a statistical framework for two-way clustering. A simultaneous clustering parameter is defined as a function of the true data generating distribution, and an estimate is obtained by applying this function to the empirical distribution. We illustrate that a wide range of clustering procedures, including generalized hierarchical methods, can be defined as ...