Course Description
New techniques in genomics have revolutionized biology, but generate large quantities of data that present challenges in extracting signal from noise. This course will provide students the basic skills to manipulate and integrate different types of biological datasets and to learn how to mine them using data analysis tools ranging from basic to state of the art. Machine learning methods provide a framework to analyze vast amounts of biological information and extract meaningful signals. By the end of the semester, students will have had exposure to a variety of modern machine learning tools for classification and prediction. We will focus on exploration of DNA data (with millions of variants), expression data (> 20,000 genes), and microbiome data (thousands of features), combined with various disease/experimental measurements. The course will cover the basics of loading and exploring datasets using visualization, followed by basic machine learning basic methods including classification and regression algorithms.