Open Access. Powered by Scholars. Published by Universities.®

Life Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Faculty Publications

Bioinformatics

Bioinformatics, Computational Biology

Articles 1 - 6 of 6

Full-Text Articles in Life Sciences

Minimalist Ensemble Algorithms For Genome-Wide Protein Localization Prediction, J.-R. Lin, A. M. Mondal, R. Liu, Jianjun Hu Jan 2012

Minimalist Ensemble Algorithms For Genome-Wide Protein Localization Prediction, J.-R. Lin, A. M. Mondal, R. Liu, Jianjun Hu

Faculty Publications

Background

Computational prediction of protein subcellular localization can greatly help to elucidate its functions. Despite the existence of dozens of protein localization prediction algorithms, the prediction accuracy and coverage are still low. Several ensemble algorithms have been proposed to improve the prediction performance, which usually include as many as 10 or more individual localization algorithms. However, their performance is still limited by the running complexity and redundancy among individual prediction algorithms.

Results

This paper proposed a novel method for rational design of minimalist ensemble algorithms for practical genome-wide protein subcellular localization prediction. The algorithm is based on combining a feature ...


Hemebind: A Novel Method For Heme Binding Residue Prediction By Combining Structural And Sequence Information, R. Liu, Jianjun Hu Jan 2011

Hemebind: A Novel Method For Heme Binding Residue Prediction By Combining Structural And Sequence Information, R. Liu, Jianjun Hu

Faculty Publications

Background

Accurate prediction of binding residues involved in the interactions between proteins and small ligands is one of the major challenges in structural bioinformatics. Heme is an essential and commonly used ligand that plays critical roles in electron transfer, catalysis, signal transduction and gene expression. Although much effort has been devoted to the development of various generic algorithms for ligand binding site prediction over the last decade, no algorithm has been specifically designed to complement experimental techniques for identification of heme binding residues. Consequently, an urgent need is to develop a computational method for recognizing these important residues.

Results

Here ...


Bayesmotif: De Novo Protein Sorting Motif Discovery From Impure Datasets, Jianjun Hu, F. Zhang Jan 2010

Bayesmotif: De Novo Protein Sorting Motif Discovery From Impure Datasets, Jianjun Hu, F. Zhang

Faculty Publications

Background

Protein sorting is the process that newly synthesized proteins are transported to their target locations within or outside of the cell. This process is precisely regulated by protein sorting signals in different forms. A major category of sorting signals are amino acid sub-sequences usually located at the N-terminals or C-terminals of protein sequences. Genome-wide experimental identification of protein sorting signals is extremely time-consuming and costly. Effective computational algorithms for de novo discovery of protein sorting signals is needed to improve the understanding of protein sorting mechanisms.

Methods

We formulated the protein sorting motif discovery problem as a classification problem ...


Integrative Disease Classification Based On Cross-Platform Microarray Data, C.-C. Liu, Jianjun Hu, M. Kalakrishnan, H. Huang, X. J. Zhou Jan 2009

Integrative Disease Classification Based On Cross-Platform Microarray Data, C.-C. Liu, Jianjun Hu, M. Kalakrishnan, H. Huang, X. J. Zhou

Faculty Publications

Background

Disease classification has been an important application of microarray technology. However, most microarray-based classifiers can only handle data generated within the same study, since microarray data generated by different laboratories or with different platforms can not be compared directly due to systematic variations. This issue has severely limited the practical use of microarray-based disease classification.

Results

In this study, we tested the feasibility of disease classification by integrating the large amount of heterogeneous microarray datasets from the public microarray repositories. Cross-platform data compatibility is created by deriving expression log-rank ratios within datasets. One may then compare vectors of log-rank ...


Emd: An Ensemble Algorithm For Discovering Regulatory Motifs In Dna Sequences, Jianjun Hu, Y. D. Yang, D. Kihara Jan 2006

Emd: An Ensemble Algorithm For Discovering Regulatory Motifs In Dna Sequences, Jianjun Hu, Y. D. Yang, D. Kihara

Faculty Publications

Background

Understanding gene regulatory networks has become one of the central research problems in bioinformatics. More than thirty algorithms have been proposed to identify DNA regulatory sites during the past thirty years. However, the prediction accuracy of these algorithms is still quite low. Ensemble algorithms have emerged as an effective strategy in bioinformatics for improving the prediction accuracy by exploiting the synergetic prediction capability of multiple algorithms.

Results

We proposed a novel clustering-based ensemble algorithm named EMD for de novo motif discovery by combining multiple predictions from multiple runs of one or more base component algorithms. The ensemble approach is ...


Integrative Missing Value Estimation For Microarray Data, Jianjun Hu, H. Li, M. S. Waterman, X. J. Zhou Jan 2006

Integrative Missing Value Estimation For Microarray Data, Jianjun Hu, H. Li, M. S. Waterman, X. J. Zhou

Faculty Publications

Background

Missing value estimation is an important preprocessing step in microarray analysis. Although several methods have been developed to solve this problem, their performance is unsatisfactory for datasets with high rates of missing data, high measurement noise, or limited numbers of samples. In fact, more than 80% of the time-series datasets in Stanford Microarray Database contain less than eight samples.

Results

We present the integrative Missing Value Estimation method (iMISS) by incorporating information from multiple reference microarray datasets to improve missing value estimation. For each gene with missing data, we derive a consistent neighbor-gene list by taking reference data sets ...