Dec 2013

#### Comparison Of Different Differential Expression Analysis Tools For Rna-Seq Data, Junfei Zhu

In molecular biology research, RNA-seq is a relatively new method for transcriptome profiling. It utilizes the next generation sequencing technology to provide huge amount information about the variety and abundance of RNA present in an organism of interest at a specific state and a given time. One of the most important tasks of RNA-seq analysis is finding genes that are expressed differently in different subject groups. A lot of differential expression analysis tools for RNA-seq have been developed, but there is no golden standard in this field. In this research, four commonly used tools (DESeq, edgeR, limma, and cuffdiff) are ...

Dec 2013

#### Statistical And Comparative Phylogeography Of Mexican Freshwater Taxa In Extreme Aquatic Environments, Lyndon M. Coghill

Phylogeography aims to understand the processes that underlie the distribution of genetic variation within and among closely related species. Although the means by which this goal might be achieved differ considerably from those that spawned the field some thirty years ago, the foundation and conceptual breakthroughs made by Avise are nonetheless the same and are as relevant today as they were two decades ago. Namely, patterns of neutral genetic variation among individuals carry the signature of a species’ demographic past, and the spatial and temporal environmental heterogeneity across a species’ geographic range can influence patterns of evolutionary change. Aquatic systems ...

Computational Molecular Coevolution, Russell J. Dickson Dec 2013

#### Computational Molecular Coevolution, Russell J. Dickson

A major goal in computational biochemistry is to obtain three-dimensional structure information from protein sequence. Coevolution represents a biological mechanism through which structural information can be obtained from a family of protein sequences. Evolutionary relationships within a family of protein sequences are revealed through sequence alignment. Statistical analyses of these sequence alignments reveals positions in the protein family that covary, and thus appear to be dependent on one another throughout the evolution of the protein family. These covarying positions are inferred to be coevolving via one of two biological mechanisms, both of which imply that coevolution is facilitated by inter-residue ...

Characterizing The Human Vaginal Microbiome Using High-Throughput Sequencing, Jean Megan E. Macklaim Dec 2013

#### Characterizing The Human Vaginal Microbiome Using High-Throughput Sequencing, Jean Megan E. Macklaim

The human vaginal microbiome undoubtedly has a significant role in reproductive health and for protection from infectious organisms. Recent efforts to characterize the bacterial species of the vagina using molecular techniques have uncovered an unexpected diversity. Using high-throughput sequencing I sought to describe the structure and function of the vaginal microbiome under different physiological states including healthy, bacterial vaginosis (BV), post-menopausal vaginal atrophy, and acute vulvovaginal candidiasis (VVC).

Partial 16S rRNA gene sequencing revealed that healthy, asymptomatic women most often have vaginal biotas dominated by Lactobacillus iners or L. crispatus. In contrast, BV is a heterogeneous, highly diversified condition with ...

#### Discovering Driver Somatic Mutations, Copy Number Alterations And Methylation Changes Using Markov Chain Monte Carlo, Bokhari Yahya

Nowadays we have tremendous amount of genetic data needing to be interpreted. Somatic mutations, copy number variations and methylation are example of the genetics data we are dealing with. Discovering driver mutations from these combined data types is challenging. Mutations are unpredictable and have broad heterogeneity, which makes our goal hard to accomplish. Many methods have been proposed to solve the mystery of genetics of cancer. In this project we manipulate those above mentioned genetics data types and choose to use and modified an existing method utilizing Markov Chain Monte Carlo (MCMC). The method introduced two properties, coverage and exclusivity ...

#### Attributing Meaning To Online Social Network Analysis For Tailored Socio-Behavioral Support Systems, Sahiti Myneni

Ubiquitous online social networks provide us with a unique opportunity to deliver scalable interventions for the support of lifestyle modifications in order to change behaviors that predispose toward cancer and other diseases. At the same time these networks act as rich data sources to inform our understanding of end-user needs. Traditionally, social network analysis is based on communication frequency among members. In this work, I introduce communication content as a complementary frame for studying these networks.

QuitNet, an online social network developed to provide smoking cessation support is considered for analysis. Qualitative coding, automated content analysis, and network analysis were ...

Dec 2013

#### Introducing A Novel Method For Genetic Analysis Of Autism Spectrum Disorder, Sepideh Nouri

Autism is a spectrum of neurological disorders that is characterized by repetitive and stereotyped behaviors, lack of social skills in verbal and non-verbal communications, and intellectual disability. Recent statistics shows that 1 out of every 88 children in the US is affected by autism.

In this thesis, I first review previous studies on genetic association analyses of autism spectrum disorder. A large number of these studies fall into two categories: Genome Wide Association Studies (GWAS) and sequencing studies. Although GWAS are able to identify multiple common risk variants associated with different diseases, these common variants explain only a small portion ...

Applying Human Computation Methods To Information Science, Christopher Glenn Harris Dec 2013

#### Applying Human Computation Methods To Information Science, Christopher Glenn Harris

Human Computation methods such as crowdsourcing and games with a purpose (GWAP) have each recently drawn considerable attention for their ability to synergize the strengths of people and technology to accomplish tasks that are challenging for either to do well alone. Despite this increased attention, much of this transformation has been focused on a few selected areas of information science.

This thesis contributes to the field of human computation as it applies to areas of information science, particularly information retrieval (IR). We begin by discussing the merits and limitations of applying crowdsourcing and game-based approaches to information science. We then ...

#### Demonstration Of A Targeted Proteome Characterization Approach For Examining Specific Metabolic Pathways In Complex Bacterial Systems, Adam Justin Martin

Multiple Reaction Monitoring (MRM) is a powerful tandem mass spectrometry (MS/MS) tool frequently implemented in proteomic studies to provide targeted analysis of proteins and peptides. The selectivity that MRM delivers is so strong that it provides the quadrupole mass spectrometers (QQQ), on which it is commonly employed, with pertinence to proteomic studies that they would otherwise lack for their relatively low resolution. Additionally, this increased level of selectivity is sufficient enough to supplant complicated fractionation techniques, additional dimensions of chromatography, and 24 hour long MS/MS experiments in simplistic biological samples. But there is a deficiency of evidence to ...

Dec 2013

#### Filter-Based Multiscale Entropy Analysis Of Complex Physiological Time Series, Liang Zhao

The multiscale entropy (MSE) has been widely and successfully used in analyzing the complexity of physiologic time series. In this thesis, we re-interpret the averaging process in MSE as filtering a time series by a filter of a piecewise constant type. From this viewpoint, we introduce the {\it filter-based multiscale entropy} (FME) which filters a time series by filters to generate its multiple frequency components and then compute the {\it blockwise} entropy of the resulting components. By choosing filters adapted to the feature of a given time series, FME is able to better capture its multiscale information and to provide ...

Nov 2013

#### Development And Evaluation Of An Ontology-Based Quality Metrics Extraction System, Sina Madani

The Institute of Medicine reports a growing demand in recent years for quality improvement within the healthcare industry. In response, numerous organizations have been involved in the development and reporting of quality measurement metrics. However, disparate data models from such organizations shift the burden of accurate and reliable metrics extraction and reporting to healthcare providers. Furthermore, manual abstraction of quality metrics and diverse implementation of Electronic Health Record (EHR) systems deepens the complexity of consistent, valid, explicit, and comparable quality measurement reporting within healthcare provider organizations.

The main objective of this research is to evaluate an ontology-based information extraction framework ...

Development Of Tyrosine Kinase Peptide Biosensors And Methods For Detection, Andrew Michael Lipchik Oct 2013

#### Development Of Tyrosine Kinase Peptide Biosensors And Methods For Detection, Andrew Michael Lipchik

New methods to monitor tyrosine kinase activity are critical for studying kinases in cell biology, drug discovery and the clinic. Peptide-based biosensors for detection of kinase activity utilitize a kinase specific artificial peptide substrate, which can report intercellular kinase activity through the incorporation of phosphate.

An artificial Syk substrate peptide was developed and incorporated with other functional modules to produce a Syk biosensor. These modules included a biotin-tag for affinity capture, a photo-cleavable amino acid to allow release of the substrate from the delivery module and the cell penetrating peptides TAT. A live cell kinase assay utilizing this biosensor was ...

#### Estimation Of Variation For High-Throughput Molecular Biological Experiments With Small Sample Size, Danni Yu

Motivation: In the quantification of molecular components, a large variation can affect and even potentially mislead the biological conclusions. Meanwhile, the high-throughput experiments often involve a small number of samples due to the limitation of cost and time. In such cases, the stochastic information may dominate the outcome of an experiment because there may not be enough samples to present the true biological information. It is challenging to distinguish the changes in phenotype from the stochastic variation.

Methods: Since the biological molecules have been quantified with different technologies, different statistical methods are required. Focusing on three types of important high-throughput ...

#### Statistical Models For Gene And Transcripts Quantification And Identification Using Rna-Seq Technology, Han Wu

RNA-Seq has emerged as a powerful technique for transcriptome study. As much as the improved sensitivity and coverage, RNA-Seq also brings challenges for data analysis. The massive amount of sequence reads data, excessive variability, uncertainties, and bias and noises stemming from multiple sources all make the analysis of RAN-Seq data difficult. Despite much progress, RNA-Seq data analysis still has much room for improvement, especially on the quantification of gene and transcript expression levels. The quantification of gene expression level is a direct inference problem, whereas the quantification of the transcript expression level is an indirect problem, because the label of ...

Oct 2013

#### Identifying Chromosome Rearrangements In The Allopolyploid Brassica Napus Using Pyrosequencing, Alexandra R. Barbella

Allopolyploids form through the hybridization of two or more diploid genomes. A challenge to reproduction in allopolyploids is that pairing can occur between homologous chromosomes or homeologous chromosomes (i.e.different subgenomes.). Crossover between homeologous chromosomes can result in chromosome rearrangements that lower fertility and overall fitness. Rearrangements can alter the dosage of either entire chromosomes or just parts of chromosomes. Understanding the frequency and extent of rearrangements will help to explain the evolution and genome stabilization of agriculturally important allopolyploid species. Pyrosequencing is a useful tool in the study dosage changes in allopolyploids because it allows quantification of the ...

Sep 2013

#### Quantifying Mutational Impacts On Intrinsic Dna Flexibility In Prokaryotic Genomes, Mohammed Alawad

The existence of synonymous codon biases across all taxonomic groups is a long standing problem in biology. While codon bias seems to be adequately explained by the maintenance of translation efficiency and accuracy in some organisms, there is still no adequate explanation of why codon biases universally track the intergenic gc content, as these regions of the genome would not be under selection pressures affecting translation. One part of the story may come from the triplet nature of codon in which each third position defines the minor groove width and thus affects the basic structure of the DNA by altering ...

Aug 2013

#### Role Of Branched-Chain Amino Acid Transporters In Staphylococcus Aureus Virulence, Sameha Omer

Branched-chain amino acids (BCAAs) act as effector molecules that signal a global transcriptional regulator, CodY, to regulate virulence factors in nutrient depleted environments. Staphylococcus aureus contains three putative BCAA transporters (BrnQ1, BrnQ2, BrnQ3) whose role in BCAA uptake is unknown. We hypothesize that BrnQ transporters are involved in BCAA uptake and contribute to virulence in S. aureus by modulating CodY activity. Results from radioactive uptake assays indicate that BrnQ1 is the predominant BrnQ transporter of isoleucine, valine and leucine. Meanwhile, BrnQ2 is more specific for isoleucine. Furthermore, only the lack of BrnQ1 hinders growth of S. aureus in chemically-defined media ...

#### Modeling Leafhopper Populations And Their Role In Transmitting Plant Diseases., Ji Ruan

This M.Sc. thesis focuses on the interactions between crops and leafhoppers.

Firstly, a general delay differential equations system is proposed, based on the infection age structure, to investigate disease dynamics when disease latencies are considered. To further the understanding on the subject, a specific model is then introduced. The basic reproduction numbers $\cR_0$ and $\cR_1$ are identified and their threshold properties are discussed. When $\cR_0 < 1$, the insect-free equilibrium is globally asymptotically stable. When $\cR_0 > 1$ and $\cR_1 < 1$, the disease-free equilibrium exists and is locally asymptotically stable. When $\cR_1>1$, the disease will persist.

Secondly, we derive another general delay differential equations system to examine how different life stages of leafhoppers affect crops. The basic reproduction numbers $\cR_0$ is determined ...

#### Development And Integration Of Informatic Tools For Qualitative And Quantitative Characterization Of Proteomic Datasets Generated By Tandem Mass Spectrometry, Rachel Michelle Adams

Shotgun proteomic experiments provide qualitative and quantitative analytical information from biological samples ranging in complexity from simple bacterial isolates to higher eukaryotes such as plants and humans and even to communities of microbial organisms. Improvements to instrument performance, sample preparation, and informatic tools are increasing the scope and volume of data that can be analyzed by mass spectrometry (MS). To accommodate for these advances, it is becoming increasingly essential to choose and/or create tools that can not only scale well but also those that make more informed decisions using additional features within the data. Incorporating novel and existing tools ...

Aug 2013

#### Alcohol Biomarkers As Predictive Factors Of Rearrest In High Risk Repeat Offense Drunk Drivers, Brian Charles Kay

Alcohol biomarkers, or naturally occurring molecules which occur in response to one's alcohol consumption, are proving to be a value tool in objectively monitoring one's alcohol consumption. Coupling this assessment tool, with advances in computing power, new and powerful predictions are becoming evermore possible. In this retrospective study, data was first collected that consisted of a sample of 249 drivers convicted of driving under the influence charge and who monitored over the course of a year by biomarker blood tests. This data was then analyzed using machine learning methods. TwoStep cluster analysis showed distinct drinking groups within the ...

Chromatin Insulators: Master Regulators Of The Eukaryotic Genome, Todd Andrew Schoborg Aug 2013

#### Chromatin Insulators: Master Regulators Of The Eukaryotic Genome, Todd Andrew Schoborg

Proper organization of the chromatin fiber within the three dimensional space of the eukaryotic nucleus relies on a number of DNA elements and their interacting proteins whose structural and functional consequences exert significant influence on genome behavior. Chromatin insulators are one such example, where it is thought that these elements assist in the formation of higher order chromatin loop structures by mediating long-range contacts between distant sites scattered throughout the genome. Such looping serves a dual role, helping to satisfy both the physical constraints needed to package the linear DNA polymer within the small volume of the nucleus while simultaneously ...

#### Rna-Sequencing Applications: Gene Expression Quantification And Methylator Phenotype Identification, Guoshuai Cai

My dissertation focuses on two aspects of RNA sequencing technology. The first is the methodology for modeling the overdispersion inherent in RNA-seq data for differential expression analysis. This aspect is addressed in three sections. The second aspect is the application of RNA-seq data to identify the CpG island methylator phenotype (CIMP) by integrating datasets of mRNA expression level and DNA methylation status.

Section 1: The cost of DNA sequencing has reduced dramatically in the past decade. Consequently, genomic research increasingly depends on sequencing technology. However it remains elusive how the sequencing capacity influences the accuracy of mRNA expression measurement. We ...

Aug 2013

#### Single-Nucleotide Polymorphisms Associated With Performance Traits In Beef Cattle Grazing Endophyte-Infected Tall Fescue, Bryan Christopher Bastin

Tall fescue (Lolium arundinaceum Schreb.) is the most prevalent forage in the Midsouth United States due in part to the presence of the endophytic fungus Neotyphodium coenophialum. The fungus, while conferring hardiness to tall fescue, contributes to decreased production efficiency in cow-calf operations. A previous genome-wide association study was performed using the Illumina 50k bovine SNP chip. Twenty-four SNP were found to be associated (P < 0.05) with adjusted birth weight and adjusted 205-day weights of calves from 48 beef cows at Ames Plantation. The first objective was to validate each SNP by testing associations with several additional phenotypes. Custom Taqman genotyping assays (Applied Biosystems, Foster City, CA) were subsequently designed to genotype each SNP in beef cattle located at Tennessee Tech University (n = 654), to validate associations in a large, independent herd. The results yielded 15 associations that were significant (P < 0.05) with 6 phenotypes linked to those affected by fescue toxicosis. The second objective investigated the link between fescue toxicosis and the XK, Kell blood group complex subunit-related, member 4 (XKR4) gene. Serum ...

#### Array-Based Genomic Diversity Measures Portray Mus Musculus Phylogenetic And Genealogical Relationships, And Detect Genetic Variation Among C57bl/6j Mice And Between Tissues Of The Same Mouse, Susan T. Eitutis

Mouse models lack affordable genomic technologies slowing the identification of candidate variants contributing to complex phenotypes. The Mouse Diversity Genotyping Array (MDGA) is a low cost, high-resolution platform permitting genomic diversity assessment. Using a validated list of >500,000 single nucleotide polymorphisms (SNPs), we applied the first comprehensive analysis of SNP differences to detect genetic distance across 362 Mus musculus samples. Genetic distance measured between distantly and closely related mice correlates with known phylogeny and genealogy. Variation detected between C57BL/6J mice is consistent with previous reports of variants within this strain. Putative genetic variation detected between and within tissues ...

Identification Of Cyclophilin Gene Family In Soybean And Characterization Of Gmcyp1, Hemanta Raj Mainali Jul 2013

#### Identification Of Cyclophilin Gene Family In Soybean And Characterization Of Gmcyp1, Hemanta Raj Mainali

I identified members of the Cyclophilin (CYP) gene family in soybean (Glycine max) and characterized the GmCYP1, one of the members of soybean CYP. CYPs belong to the immunophilin superfamily with peptidyl-prolyl cis-trans isomerase (PPIase) activity. PPIase catalyzes the interconversion of the cis- and trans-rotamers of the peptidyl-prolyl amide bond of peptides. After extensive data mining, I identified 62 different CYP genes in soybean (GmCYP1 to GmCYP62), of which 8 are multi-domain proteins and 54 are single domain proteins. At least 25% of the GmCYP genes are expressed in soybean. GmCYP1 localizes to the nucleus and the cytoplasm and ...

Jul 2013

#### A Mathematical Model And Numerical Method For Thermoelectric Dna Sequencing, Liwei Shi

DNA sequencing is the process of determining the precise order of nucleotide bases, adenine, guanine, cytosine, and thymine within a DNA molecule. It includes any method or technology that is used to determine the order of the four bases in a strand of DNA. The advent of rapid DNA sequencing methods has greatly accelerated biological and medical research and discovery. Thermoelectric DNA sequencing is a novel method to sequence DNA by measuring the heat that is released when DNA polymerase inserts a deoxyribonucleoside triphosphate into a growing DNA strand. The thermoelectric device for this project is composed of four parts ...

Jun 2013

#### On Identifying Signatures Of Positive Selection In Human Populations: A Dissertation, Jessica L. Crisci

As sequencing technology continues to produce better quality genomes at decreasing costs, there has been a recent surge in the variety of data that we are now able to analyze. This is particularly true with regards to our understanding of the human genome—where the last decade has seen data advances in primate epigenomics, ancient hominid genomics, and a proliferation of human polymorphism data from multiple populations. In order to utilize such data however, it has become critical to develop increasingly sophisticated tools spanning both bioinformatics and statistical inference. In population genetics particularly, new statistical approaches for analyzing population data ...

Genetic Approaches To Studying Complex Human Disease, Joseph B. Dube Jun 2013

#### Genetic Approaches To Studying Complex Human Disease, Joseph B. Dube

Common, complex diseases such as cardiovascular disease (CVD) represent an intricate interaction between environmental and genetic factors and now account for the leading causes of mortality in western society. By investigating the genetic component of complex disease etiology, we have gained a better understanding of the biological pathways underlying complex disease and the heterogeneity of complex disease risk. However, the development of high throughput genomic technologies and large well-phenotyped multi-ethnic cohorts has opened the door towards more in-depth and trans-disciplinary approaches to studying the genetics of complex disease pathogenesis. Accordingly, we sought to investigate select complex traits and diseases using ...

Rna-Sequence Analysis Of Human Melanoma Cells, Jharna Miya May 2013

#### Rna-Sequence Analysis Of Human Melanoma Cells, Jharna Miya

RNA-sequencing refers to the use of high throughput sequencing technologies that are used to sequence cDNA in order to get the complete information of a sample’s RNA content. The objective of this study is to analyze this data in different aspects and to characterize gene expression. Besides this characterization, the data was also used to investigate the effect of sequencing depth on gene expression measurements.

This research focuses on quantitative measurement of expression levels of genes and their transcripts. In this study, complementary DNA fragments of cultured human melanoma cells are sequenced and a total of 139,501,106 ...

May 2013

#### Performance Comparison Of Five Rna-Seq Alignment Tools, Yuanpeng Lu

Aligning millions of short reads to a reference genome is a critical task in high throughput sequencing. In recent years, a large number of mapping algorithms have been developed, all of which have in common that they align a vast number of reads to genomic or transcriptomic sequences. RNA-Seq data is discrete in nature, therefore with reasonable gene models and comparative metrics RNA-Seq data can be simulated to sufficient accuracy to enable meaningful benchmarking of alignment algorithms. To provide guidance in the choice of alignment algorithms, five different alignment tools for RNA-Seq data are evaluated. In order to compare the ...