- AutorIn
- Michael Rade
- Titel
- Biomarker discovery and confirmation using statistical and machine learning approaches in transcriptomics
- Zitierfähige Url:
- https://nbn-resolving.org/urn:nbn:de:bsz:15-qucosa2-904219
- Datum der Einreichung
- 16.05.2023
- Datum der Verteidigung
- 23.01.2024
- Abstract (EN)
- A biomarker is an indicator of (ab)normal molecular or cellular processes found in tissues, blood, or other body fluids. It can also serve as an indicator of the response to an intervention, such as a drug response, toxic exposure, or disease. As a measurable and evaluable characteristic that indicative underlying biological processes, biomarkers can be used in immunology, toxicology, cancer research, and in a variety of application scenarios. These applications include the use of biomarkers as diagnostic tools, predictors of disease prognosis, monitoring clinical response to an intervention, or as tools for staging a disease. Here, we aimed to discover gene-based biomarkers using transcriptome data, including microarray and RNA-sequencing (RNA-Seq). This thesis consists of two thematically distinct parts, but with the common goal of identifying and confirming robust biomarkers through meta-analysis approaches. In the first part, we identified and verified biomarker signatures that robustly evaluate T-cell activation in a time-resolved manner. We applied a meta-analysis of anti-CD3/CD28 induced CD4+ T-cell activation kinetics of publicly available transcriptome-wide time series using a random effects model. We used non-negative matrix factorization, an unsupervised deconvolution method, to infer changes in biological patterns over time. For verification and to further map a wider variety of the T-cell landscape, we performed a time series of transcriptome-wide RNA sequencing on activated blood T-cells. We identified time-resolved gene expression profiles comprising 521 genes of up to 10 disjunct time points during activation and different polarization conditions. They cover sustained repressed, intermediate, and late response expression rates across multiple T-cell populations, thus defining consensus biomarker signatures for T-cell activation. These biomarker signatures are a valuable source for e.g., monitoring transcriptional changes during T-cell activation or assessing dysregulated functions of human T-cell immunity. In the second part, we developed a prognostic gene signature that predicts the clinical outcome (biochemical recurrence, BCR) of prostate cancer and is suitable for supporting clinical decision-making. We previously developed the prognostic ProstaTrend RNA signature based on transcriptome‐wide microarray and RNA-Seq analyses, primarily of prostatectomy specimens. An RNA-Seq study of formalin-fixed paraffin-embedded (FFPE) tumor biopsies have now allowed us to use this test as a basis for the development of a novel test that is applicable to FFPE biopsies as a tool for early routine PCa diagnostics. The prognostic relevance was evaluated using the Transcriptomic Risk Score (TRS). Validation of the TRS using the original ProstaTrend signature in the cohort of FFPE biopsies revealed a relevant impact of FFPE-associated degradation on gene expression and consequently no significant association with prognosis in FFPE tissue. However, the TRS based on the new version of the revised ProstaTrend signature, which included 204 genes (of originally 1396 genes), was significantly associated with BCR in the FFPE biopsy cohort and retained prognostic relevance when adjusted for Gleason Grade Groups. We confirmed a significant association with BCR in 9 independent cohorts including 1109 patients. Comparison of the prognostic performance of the TRS with 17 other prognostically relevant PCa panels revealed that the revised ProstaTrend signature was among the best-ranked panels.
- Freie Schlagwörter (EN)
- transcriptome, biomarkers, time series, prostate cancer, T cell,
- Klassifikation (DDC)
- 000
- Den akademischen Grad verleihende / prüfende Institution
- Universität Leipzig, Leipzig
- Version / Begutachtungsstatus
- publizierte Version / Verlagsversion
- URN Qucosa
- urn:nbn:de:bsz:15-qucosa2-904219
- Veröffentlichungsdatum Qucosa
- 11.03.2024
- Dokumenttyp
- Dissertation
- Sprache des Dokumentes
- Englisch
- Lizenz / Rechtehinweis
- CC BY 4.0
- Inhaltsverzeichnis
Part 1 A time-resolved meta-analysis of consensus gene expression profiles during human T-cell activation 1 1 Introduction 3 1.1 Basic concepts of T-cells in the adaptive immunity 3 1.2 T-cell exit from quiescence 5 1.3 Hallmarks of T-cell activation 6 1.4 Modelling dynamic biological processes using time-series transcriptome data 9 1.5 Dimensionality reduction for signature inference 11 1.5.1 Introduction to NMF 13 1.5.2 Parts-based representation 14 1.5.3 Framework of NMF 16 1.5.4 Non-convexity 17 1.5.5 Objective function 18 1.5.6 Update rules 19 1.5.7 Interpretation of factorized matrices in transcriptome data 20 1.6 Motivation 22 2 Development of time-resolved consensus gene signatures 24 2.1 Methods 24 2.1.1 Data sources 24 2.1.2 Pre-processing 28 2.1.3 Statistical analysis 29 2.2 Towards time-resolved consensus gene signatures for T-cell activation 31 2.3 Time-resolved transcriptome-wide changes depict common trends across T-cell populations 36 2.3.1 Differential gene expression analysis . . . 36 2.3.2 Meta-analysis strategy 38 2.4 Discovery Set: NMF for each T-cell population 44 2.4.1 Determination of the factorization rank 44 2.4.2 Illustration of the NMF for time series data 46 2.4.3 Temporal categorization of metagenes 46 2.4.4 NMF revealed coherent metagenes across T-cell populations 47 2.4.5 Refining the variety of gene expression profiles 50 2.5 Discovery Set: Consensus gene expression profiles 53 2.6 Verification and mapping to a wider variety of T-cell transcriptional landscape 57 2.6.1 DGEA: activated vs unactivated Pan T-cells 57 2.6.2 Finalization of the consensus gene expression profiles 59 3 Consensus gene expression profiles in single-cell transcriptomics 61 3.1 Methods 61 3.2 Dataset 1: (un)activated CD4 T-cells for 4h 62 3.3 Dataset 2: Consensus gene expression profiles are enriched in CAR T-cell products from patients with low-grade ICANS 64 4 Discussion 69 4.1 Conclusion 71 Part 2 ProstaTrend, a prognostic gene-expression signature for prostate cancer 73 5 Introduction 75 5.0.1 Improved prognostic performance by combining genomic and clinicopathological features 76 5.0.2 ProstaTrend: a prognostic signature for long-term prognosis . . . . . . . 77 5.1 Motivation 79 5.1.1 From ProstaTrend to revised ProstaTrend 79 6 Development of the revised ProstaTrend signature 81 6.1 Methods 81 6.1.1 Data sources 81 6.1.2 Pre-processing 84 6.1.3 Statistical analysis 86 6.2 From ProstaTrend to revised ProstaTrend 88 6.2.1 Applying the Transcriptomic Risk Score to the FFPE biopsy cohort 89 6.2.2 Confounding factors and gene filtering . . 91 6.2.3 Filtering of genes based on prognostic impact in the TCGA cohort 93 6.3 Association of the TRS with BCR is confirmed in 9 validation cohorts 94 6.4 Meta-analysis for ProstaTrend genes across 13 cohorts 101 7 Integrated single-cell and spatial transcriptomics of human PCa 104 7.1 Methods (scRNA-Seq) 104 7.1.1 Analysis strategy 104 7.1.2 Differential expression analysis (DGEA) 106 7.1.3 Evaluation of cell type/lineage specific markers 107 7.1.4 Tumor cell identification using inferCNV 107 7.2 Methods (Spatial transcriptomics) 108 7.3 Toward the PCa single-cell atlas 109 7.3.1 First iteration: Post-clustering filtering 109 7.3.2 Second iteration: QC and cell cluster annotation 111 7.3.3 Final PCa atlas 116 7.4 ProstaTrend expression in the PCa single-cell atlas 119 7.4.1 Relating TRS of (revised) ProstaTrend to cell compartments 119 7.4.2 Association of ProstaTrend genes with cell lineages/types 119 7.5 ProstaTrend in the spatial context of prostate tissue 122 7.5.1 Annotation and carcinoma associated spot cluster 122 7.5.2 Relating TRS of (revised) ProstaTrend to spots 125 8 T-cell landscape of PCa 126 8.1 Elucidation of T-cell identities 126 8.1.1 T-cell cell cluster marker 126 8.1.2 Annotated T-cells from the re-analyzed datasets 128 8.1.3 Correlation between cell identities and immune cell populations 129 8.1.4 Enrichment of T-cell molecular mechanisms’ gene sets 131 8.1.5 Cell composition and summary of patient specimens 131 8.2 Consensus gene expression profiles for T-cell activation 132 8.2.1 Sample-level inferences 136 9 Discussion 138 9.1 The revised prognostic signature ProstaTrend (Chapter 6 & 7) 138 9.2 Activation states in the T-cell landscape of PCa (Chapter 8) 140 Summary (Part 1 & 2) 142 A Appendix 144 1 Part 1 145 1.1 DGEA and meta-analysis 145 1.2 Non-negative matrix factorization 152 1.3 CAR T-cell products 168 2 Part 2 170 2.1 Revised ProstaTrend and validation 170 2.2 PCa cell atlas 181 2.3 PCa T-cell atlas 191 List of Figures and Tables 198 Acronyms 203 Bibliography 204 Acknowledgements 218