Doctoral Thesis
Refine
Document Type
- Doctoral Thesis (2) (remove)
Language
- English (2)
Has Fulltext
- yes (2)
Is part of the Bibliography
- no (2)
Keywords
- GWAS (2) (remove)
Genomics is the field of modern biology that studies the genome as the sum of all genes of a given organism. Genomics includes the analysis of genomic variations in order to identify genetic susceptibility loci for various human diseases. Besides genomics, there are related fields summarized by the term "Omics" such as transcriptomics and proteomics, studying the sum of all transcripts and proteins in a defined biological system, respectively. Genetic variants, namely single nucleotide polymorphisms (SNPs) and copy number variations (CNVs) are used to identify genomic loci associated with human traits and diseases. Genome-wide association studies (GWASs) based on SNP data have been performed for a wide range of human traits and diseases. In the population-based Study of Health in Pomerania (SHIP) and the independent SHIP-TREND study, whole-genome genotyping data were available for 4081 and 986 individuals, respectively. In contrast to the widely used GWAS based on SNPs, association studies using CNV data are difficult to implement and thus less common. Therefore, one aim of this work was to detect CNVs using the whole-genome genotyping data available for 4081 individuals from SHIP. Another aim was to develop an efficient workflow for the analysis of these CNVs. As most common genetic variants exhibit only relatively small effects on phenotypic variability, large sample sizes are needed to maximize the statistical power to detect such effects. Therefore, the integration of data from multiple collaborating studies is indispensable. In this context, several CNV studies with the SHIP data have been performed and published, for example on body mass index (BMI) phenotypes where the SHIP cohort was used as a population-based control. Trait-associated genetic markers identified through GWASs are often intergenic or synonymous coding, and those loci identified through whole-genome CNV analyses often contain multiple genes, making it difficult to identify the causal variants. In this context, the functional analysis of identified loci aids in determining causal variant(s). One possibility to conduct functional analysis is the expression quantitative trait loci (eQTL) analysis, defined as the association of genome-wide genotyping data with genome-wide gene expression data based on measured transcriptomes. This allows the identification of genetic variants influencing the expression levels of defined genes. A further example are transcriptome-wide association analysis (TWAS), defined as the association of phenotype data with whole-genome expression data. Thus, another aim of this work was to establish an analysis pipeline for processing such expression data, which were available for about 1000 individuals from the SHIP-TREND study. Here, array-based gene expression data were generated using RNA prepared from whole-blood. Interpretation of TWAS results is often difficult, because of possible reverse causation on gene expression data. Furthermore, technical errors of measurement may bias the results. In a comprehensive work, biological and technical factors influencing measured gene expression data have been identified and were subsequently taken into account to improve the association analyses. To further elucidate the molecular mechanisms underlying the relationship of gene expression levels with human traits or diseases, pathway analyses using the Ingenuity Pathway Analysis (IPA) tool have been performed in connection with the TWAS. As for GWASs, the associations identified in TWAS usually exhibit only small effect sizes, highlighting the need for larger studies or meta-analysis to identify all susceptibility variants. In this context several eQTL- and TWAS meta-analyses using the SHIP-TREND data have been performed, for example on the phenotypes age, sex, BMI, smoking status and serum lipid traits. The results of these analyses are in preparation for publication and the most advanced example, the correlation of expression data with BMI, is presented here. The integration of whole-genome genotyping and expression data provides new functional information of the underlying biological mechanisms of complex human traits and diseases. Within the frame of this work, this could be demonstrated for the example of susceptibility to Helicobacter pylori infection.
Genome-wide association studies (GWAS) are used to identify genetic markers linked with at least partially heritable diseases or phenotypes without prior knowledge of any disease-associated genetic loci. In summer 2008, all individuals of the population based cohort Study of Health in Pomerania (SHIP) were individually genotyped using the Affymetrix Genome-Wide Human SNP Array 6.0 microarray. The aim of this work was to establish an efficient workflow for GWAS using the more than 4000 individually genotyped samples of the SHIP cohort as well as pooled samples, focusing exclusively on analyzing genetic variations based on single nucleotide polymorphisms (SNPs). Firstly, an optimal array platform for the genotyping analysis had to be chosen that detected most of the available genetic variants at a high level of accuracy. Secondly, extensive quality controls had to be performed starting from DNA extraction and including tests of the generated array data by the analysis software to obtain the most reliable data for the subsequent association studies. For the identification of loci with smaller genetic influences, individual cohorts were meta-analyzed in large nationally and internationally organized consortia (e.g. CHARGE, BPGen, HaemGen, GIANT, CKD Gen). To participate in those meta-analyses, a comparable common set of genetic data had to be generated. This was done by imputation of the data generated by individual array-based genotyping on the basis of a reference panel using chromosomal linkage information. Due to the extensive phenotype information in the SHIP study, it was possible to perform many genome-wide discovery analyses and replication studies of possible susceptibility loci in a short time once the genetic data was available and processed. This resulted in the necessity to set up an efficient workflow for storing the huge amount of genetic data, converting it into different formats readable for specific analysis software, performing the association analyses and processing the results into a human-readable and clear format. This included replications, GWAS and meta-analyses of several cohorts. Many susceptibility loci were newly identified in different association studies with the SHIP data included and were subsequently published. In this work, genetic association studies with the SHIP data included were performed and published on blood pressure, uric acid concentrations, cardiac structure and function, lipid metabolism, hematological parameters, kidney functions, smoking quantity, circulating IGF-I and IGFBP-3 concentrations and thyroid volume including the risk of goiter development. Besides the SHIP cohort, there was a need to use other, especially patient cohorts for GWAS. Since no genotype information from these patient cohorts was available and the individual genotyping of many probands is still expensive and therefore often not affordable, we established the cost-effective allelotyping method that relied on pooling of DNA samples prior to the hybridization with microarrays. After estimating the pooling-specific error of a case-control allelotyping study, the allelotyping approach was used for identifying genetic susceptibility loci associated with aggressive periodontitis. If not referring to work of collaborators, all statistical analyses, data handling and in silico work concerning the SHIP data described in this context was performed by the author of this dissertation.