Statistics and Data Science Seminar
Hyo Young Choi
University of Tennessee
High-dimensional Transformation of Single Transcript Measurements for Identifying Structural Variants in Cancer
Abstract: Over the last decade, many innovative technologies have generated vast amounts of large-scale biological data. The accumulation of so-called “big data”, especially from next generation sequencing technologies, has created many exciting areas in statistics as well as biology. In particular, statistical tools and machine learning techniques have proven to be critical in cancer genomics, transforming large and complex data into clinically relevant knowledge. While many computational tools have been developed for analyzing such big data, unprecedented challenges remain in turning it into meaningful and actionable insights. This talk primarily concerns the issue of high-dimensional outliers which are often challenging to identify in high-throughput sequencing data due to the special structure of high dimensional space. We introduce a new notion of high dimensional outliers that embraces various types and provides deep insights into understanding the behavior of these outliers based on several asymptotic regimes. As an important application, we introduce a statistical method for unsupervised screening of a range of structural alterations in RNA-seq data. We identify a number of biologically important outliers along with the successful characterization of the subspace associated with outliers, which holds promise for identifying otherwise obscured signals.
Wednesday February 21, 2024 at 4:00 PM in Zoom