Refine
Year of publication
- 2013 (2) (remove)
Document Type
- Doctoral Thesis (2)
Language
- English (2)
Has Fulltext
- yes (2)
Is part of the Bibliography
- no (2)
Keywords
- CNV (1)
- GWAS (1)
- General stress response (1)
- Genetik (1)
- Heubacillus (1)
- Microarray (1)
- SNP (1)
- SigB (1)
- TWAS (1)
- eQTL (1)
Institute
- Interfakultäres Institut für Genetik und Funktionelle Genomforschung (UMG) (2) (remove)
Genomics is the field of modern biology that studies the genome as the sum of all genes of a given organism. Genomics includes the analysis of genomic variations in order to identify genetic susceptibility loci for various human diseases. Besides genomics, there are related fields summarized by the term "Omics" such as transcriptomics and proteomics, studying the sum of all transcripts and proteins in a defined biological system, respectively. Genetic variants, namely single nucleotide polymorphisms (SNPs) and copy number variations (CNVs) are used to identify genomic loci associated with human traits and diseases. Genome-wide association studies (GWASs) based on SNP data have been performed for a wide range of human traits and diseases. In the population-based Study of Health in Pomerania (SHIP) and the independent SHIP-TREND study, whole-genome genotyping data were available for 4081 and 986 individuals, respectively. In contrast to the widely used GWAS based on SNPs, association studies using CNV data are difficult to implement and thus less common. Therefore, one aim of this work was to detect CNVs using the whole-genome genotyping data available for 4081 individuals from SHIP. Another aim was to develop an efficient workflow for the analysis of these CNVs. As most common genetic variants exhibit only relatively small effects on phenotypic variability, large sample sizes are needed to maximize the statistical power to detect such effects. Therefore, the integration of data from multiple collaborating studies is indispensable. In this context, several CNV studies with the SHIP data have been performed and published, for example on body mass index (BMI) phenotypes where the SHIP cohort was used as a population-based control. Trait-associated genetic markers identified through GWASs are often intergenic or synonymous coding, and those loci identified through whole-genome CNV analyses often contain multiple genes, making it difficult to identify the causal variants. In this context, the functional analysis of identified loci aids in determining causal variant(s). One possibility to conduct functional analysis is the expression quantitative trait loci (eQTL) analysis, defined as the association of genome-wide genotyping data with genome-wide gene expression data based on measured transcriptomes. This allows the identification of genetic variants influencing the expression levels of defined genes. A further example are transcriptome-wide association analysis (TWAS), defined as the association of phenotype data with whole-genome expression data. Thus, another aim of this work was to establish an analysis pipeline for processing such expression data, which were available for about 1000 individuals from the SHIP-TREND study. Here, array-based gene expression data were generated using RNA prepared from whole-blood. Interpretation of TWAS results is often difficult, because of possible reverse causation on gene expression data. Furthermore, technical errors of measurement may bias the results. In a comprehensive work, biological and technical factors influencing measured gene expression data have been identified and were subsequently taken into account to improve the association analyses. To further elucidate the molecular mechanisms underlying the relationship of gene expression levels with human traits or diseases, pathway analyses using the Ingenuity Pathway Analysis (IPA) tool have been performed in connection with the TWAS. As for GWASs, the associations identified in TWAS usually exhibit only small effect sizes, highlighting the need for larger studies or meta-analysis to identify all susceptibility variants. In this context several eQTL- and TWAS meta-analyses using the SHIP-TREND data have been performed, for example on the phenotypes age, sex, BMI, smoking status and serum lipid traits. The results of these analyses are in preparation for publication and the most advanced example, the correlation of expression data with BMI, is presented here. The integration of whole-genome genotyping and expression data provides new functional information of the underlying biological mechanisms of complex human traits and diseases. Within the frame of this work, this could be demonstrated for the example of susceptibility to Helicobacter pylori infection.
The soil living, Gram-positive bacterium Bacillus subtilis is frequently exposed to a wide variety of stress and starvation conditions in its natural environment. In order to survive under these environmental and energy stresses, the bacterium acquired a general stress response mechanism mediated by the alternative sigma factor, SigB. A wide-variety of stress conditions such as environmental stress conditions like ethanol stress, heat stress, oxidative stress, osmotic stress or limitation of glucose, oxygen, phosphate etc.; and low temperature growth induce this SigB-dependent general stress response. Though much is known about the mechanisms of activation of this general stress response, the conditions that induce the SigB regulon and its general functions, the definition of the structure of the SigB regulon is not completely clear. The SigB-dependent general stress regulon has previously been characterized by proteomic approaches as well as DNA-array based expression studies. Genome-wide expression studies performed by Price, Petersohn and Helmann defined the SigB regulon containing well above 100 target genes, however the overlapping list of target genes contains only 67 members. The differences between these studies probably result from the different strains, growth conditions, array platforms and experimental setups used in these studies. The first part of this work presents a targeted microarray analysis, which was performed to gain a better understanding of the structure of the general stress regulon. This is the first study analyzing the gene expression of a wild type strain and its isogenic sigB mutant strain for almost all known SigB inducing conditions, using the same array platform. Furthermore, the kinetics of the gene expression of 252 putative SigB-dependent genes and 36 appropriate control genes were recorded. The data were analyzed using Random Forest, a machine-learning algorithm, by incorporating the knowledge of previous studies. Two Random Forest models were designed in this study. The “expression RF” model was designed to identify genes showing expression differences between wild type and sigB mutant and the “kinetic RF” model to identify genes having a SigB-dependent expression kinetic, but is subject to secondary regulators next to SigB influencing their expression in the sigB mutant. The random forest classification using the “expression RF” model identified 166 genes as SigB regulon members based on the expression differences between the wild type and the sigB mutant strain. A variable importance plot showing the impact during the classification process within the “expression RF” could assign a hierarchy to the stress conditions investigated in this study. This hierarchy suggested all the RsbU-dependent environmental stresses to have higher impact on SigB-dependent gene expression compared to the RsbP-dependent energy stresses. The “kinetic RF” model identified 30 additional genes, having additional regulators next to SigB. The SigB dependency of the 30 genes identified by the “kinetic RF” model was validated by screening for SigB promoter motifs within the upstream region of these genes. The hierarchical clustering of the obtained motifs scores with the expression ratios of the SigB regulon members predicted in the current work revealed that only a subset of genes displayed correlation of gene expression values and sequence motifs. As this observation is not true for all sets of genes, it cannot be generalized that gene expression is only correlated with the corresponding motif scores. In total 196 SigB regulon members could be classified by this targeted oligo nucleotide microarray study. The majority of these regulon members were preceded by a putative SigB promoter motif either identified previously or predicted in the current work. The inclusion of the broad range of stress conditions, from environmental stresses to energy limiting conditions enabled a more detailed characterization of the structure of the general stress regulon of B. subtilis. The implementation of machine learning algorithms allowed the prediction with a minimum number of false-positives. In the second part of this work a high resolution tiling array analysis for the majority of growth conditions, stresses and changes in carbon sources supply was exploited for the screening for new SigB targets within already annotated or newly annotated RNA features. Thereby 133 previously un-annotated RNA features, which were completely new, were assigned to the SigB regulon. 50 of these 133 new features encode antisense RNAs which can have potential influence on the transcription / translation of their sense RNAs targets. A set of 282 annotated genes were indentified to be SigB regulon members, comparison with the targeted oligo nucleotide study, 90 genes were newly identified and not known to be SigB-dependent before. The analysis of the expression levels of these genes by k-means clustering revealed a cluster of 32 genes having low induction levels in all SigB-inducing conditions, although the majority of these genes possess a well-conserved SigB promoter motif. However, all these genes are probably subject to the control of regulators other than SigB, which might mask the typical strong SigB-dependent induction in the analyzed stress conditions. The analysis of the expression levels of the SigB regulon under a variety of conditions, revealed the SigB-dependent expression in conditions such as growth on plates, in swarming cells, biofilm formation and growth on glycerol as a carbon source. The possible reason for the induction of the SigB regulon during growth on plates and in swarming cells was supposed to be due to scarcity of the nutrients on plates, e.g. glucose limitation. SigB-dependent genes were likely induced during growth on glycerol due to the oxygen limitation that arose during the growth. However, induction of the SigB regulon during biofilm formation is assumed to be due to the phosphate limitation. The description of these new SigB activating stimuli gains support from the fact that the majority of the SigB-dependent genes were induced under these growth conditions. In addition to the general stress response, B. subtilis cells have stress specific adaptive mechanisms such as osmotic response, which was addressed in the third part of this dissertation. The frequent flooding and drying of the soil triggered osmotic stress, one of the most common stress conditions encountered by soil bacteria. Bacterial cells are equipped with osmo-specific adaptation responses in which specific regulation of a set of genes is used to maintain proper cellular function. It was known from previous studies that a large set of genes were influenced in expression by salt shock as well as growth at high osmolarity. Detailed analysis of the tiling array data revealed 467 differentially regulated newly annotated features during salt shock and 251 newly annotated features that were expressed at a different level during continuous growth at high versus low osmolarity. A comparison of the studies that used the sigB knockout mutant with the tiling array study also provided support for the sigma factor competition in control of the expression of osmo-adaptive genes. The level of induction of specific osmo-adaptive genes was much higher in the sigB mutant strain compared to the wild type strain. Furthermore, the tiling array data revealed a SigB-dependent antisense RNA S1290 upstream of the opuB operon that transports choline to the cell. The presence of this antisense RNA had a potential impact on the transcription of the opuB operon, during salt shock. In agreement with the previous studies, the tiling array data assigned the osmotically regulated proHJ operon to the SigE regulon, with a SigE promoter upstream. In addition, the significantly higher percentage of proline among spore coat proteins also supports the assumption that osmotic synthesis of proline might play a role during the generation of spores. In conclusion, the tiling array data revealed newly annotated RNA features that are regulated during the general stress response as well as the osmotic response of the cell. The current work identifies new conditions that induce the majority of SigB-dependent genes as well as the new features that regulate the osmotically induced genes.