Machine Learning for Statistical Genetics
Machine Learning for Statistical Genetics by Karsten Borgwardt - Machine Learning Summer School at Purdue, 2011. In this course, I will give an introduction to
the field of Machine Learning in Statistical Genetics. The grand challenge in this area is to develop algorithms and statistical tests that allow biological and
medical researchers to explore the genetic basis of common phenotypic traits in humans, animals and plants.
I will describe two research questions in genome-wide association studies, to whose solutions machine learning may contribute: detecting epistatic interactions
between genetic loci and analysing structured phenotypes.First, detecting epistatic interactions in the genome is a computationally highly demanding problem,
as the number of candidate pairs grows quadratically with the number of markers. Feature selection techniques from machine learning may help to efficiently search
this enormous set of candidates. In particular, I will describe a highly scalable approach to epistatic interaction discovery in case-control studies via
Support Vector Machines, which we recently developed.Second,more and more phenotypic data is being recorded in form of "structured data", e.g., time series, images,
or videos. For genome-wide association studies on these structured phenotypes, the classic statistical machinery for association testing can be extended by means of
machine learning to handle non-scalar phenotypes, such as time series or images. In particular, I will describe our work on two-sample tests for structured data.
Machine Learning for Statistical Genetics |
Machine Learning Summer School at Purdue, 2011