Doctoral Thesis
Refine
Document Type
- Doctoral Thesis (2) (remove)
Language
- English (2) (remove)
Has Fulltext
- yes (2)
Is part of the Bibliography
- no (2)
Keywords
- proteome (2) (remove)
Approaches to the Analysis of Proteomics and Transcriptomics Data based on Statistical Methodology
(2014)
Recent developments in genomics and molecular biology led to the generation of an enormous amount of complex data of different origin. This is demonstrated by a number of published results from microarray experiments in Gene Expression Omnibus. The number was growing in exponential pace over the last decade. The challenge of interpreting these vast amounts of data from different technologies led to the development of new methods in the fields of computational biology and bioinformatics. Researchers often want to represent biological phenomena in the most detailed and comprehensive way. However, due to the technological limitations and other factors like limited resources this is not always possible. On one hand, more detailed and comprehensive research generates data of high complexity that is very often difficult to approach analytically, however, giving bioinformatics a chance to draw more precise and deeper conclusions. On the other hand, for low-complexity tasks the data distribution is known and we can fit a mathematical model. Then, to infer from this mathematical model, researchers can use well-known and standard methodologies. In return for using standard methodologies, the biological questions we are answering might not be unveiling the whole complexity of the biological meaning. Nowadays it is a standard that a biological study involves generation of large amounts of data that needs to be analyzed with a statistical inference. Sometimes data challenge researchers with low complexity task that can be performed with standard and popular methodologies as in Proteomic analysis of mouse oocytes reveals 28 candidate factors of the "reprogrammome". There, we established a protocol for proteomics data that involves preprocessing of the raw data and conducting Gene Ontology overrepresentation analysis utilizing hypergeometric distribution. In cases, where the data complexity is high and there are no published frameworks a researcher could follow, randomization can be an approach to exploit. In two studies by The mouse oocyte proteome escapes maternal aging and CellFateScout - a bioinformatics tool for elucidating small molecule signaling pathways that drive cells in a specific direction we showed how randomization can be performed for distinct complex tasks. In The mouse oocyte proteome escapes maternal aging we constructed a random sample of semantic similarity score between oocyte transcriptome and random transcriptome subset of oocyte proteome size. Therefore, we could calculate whether the proteome is representative of the trancriptome. Further, we established a novel framework for Gene Ontology overrepresentation that involves randomization testing. Every Gene Ontology term is tested whether randomly reassigning all gene labels of belonging to or not belonging to this term will decrease the overall expression level in this term. In CellFateScout - a bioinformatics tool for elucidating small molecule signaling pathways that drive cells in a specific direction we validated CellFateScout against other well-known bioinformatics tools. We stated the question whether our plugin is able to predict small molecule effects better in terms of expression signatures. For this, we constructed a protocol that uses randomization testing. We assess here if the small molecule effect described as a (set of) active signaling pathways, as detected by our plugin or other bioinformatics tools, is significantly closer to known small molecule targets than a random path.
The Gram-positive bacterium Bacillus licheniformis is an important industrial host for the production of enzymes. Genomic DNA arrays and proteomics are being used to investigate the physiology of this bacterium. A genome-wide transcriptional profiling analysis of the adaptation of B. licheniformis to phosphate starvation shows more than 100 induced genes. Most of strongly induced genes belong to the putative Pho regulon. The data of the transcriptome analysis have been verified by the analysis of the extracellular and cytoplasmic proteome. The main response of B. licheniformis to glucose starvation was a switch to the usage of alternative carbon sources. In addition, B. licheniformis seems to be using other organic substances like amino acids and lipids as carbon sources when subjected to glucose starvation. This was indicated by the induction of a high number of genes the proteins of which are involved in amino acid and lipid degradation. During nitrogen starvation genes necessary for the recruitment of nitrogen from alternative sources were induced, e.g. genes for nitrate and nitrite assimilation, several proteases and peptidases. Both starvation conditions led to a down-regulation of the transcription of most vegetative genes and subsequently to a reduced synthesis of the corresponding proteins. Only a few genes were induced by both starvation conditions like yvyD, citA and the methylcitrate shunt genes mmgD, mmgE and yqiQ. Data of this study use to better understand the physiology of this bacterium during fermentation processes and thus to identify and circumvent bottlenecks of B. licheniformis based bioprocesses. In addition, the phytase promoter was tested for the construction of an alternative phosphate regulated expression system for B. licheniformis.