Day | Start | End | Building | Room |
---|---|---|---|---|
|
|
|
|
|
Catalog Description
Large datasets are increasingly becoming available across many sectors such as healthcare, energy, and online markets. This course focuses on methods that allow “learning” from such datasets to uncover underlying relationships and patterns in the data, with a focus on predictive performance of various models that can be built to represent the underlying function generating the data. The course starts with a review of basic statistical concepts and linear regression. But the course will focus mostly on introducing students to regression, classification, and clustering techniques beyond linear regression, such as tree-based approaches, support vector machines, neural networks/deep learning, LLMs, and unsupervised learning. This course is intended for first- and second-year Masters students. Ph.D. students with an interest machine learning models may also find this course useful.
In covering the material from the assigned textbook and complementary selected readings (e.g. journal articles), this course will emphasize both on formulaic and conceptual understanding of the discussed methods. As necessary, the instructor will draw on material from outside the textbook for driving conceptual clarity and showcasing the application of the methods learned to a broad range of practical problems.
Students with prior credit in INF 385T, topic: Intro to Machine Learning/SAL may not enroll in this class.
Prerequisites
Basic grasp of statistics and linear regression would be helpful. However, all relevant concepts will be reviewed during the course. Problem sets will include applied problems, including some from the textbook, that will require programming. All coding for this course will be in R or Python. In the beginning, the instructor will point students to preparatory resources in R and Python to provide the necessary background and toolsets in R/Python that will be necessary in solving the problem sets.
Graduate standing.
Restrictions
Restricted to graduate students in the School of Information through registration period 3. Outside students may be permitted to join the waitlist beginning in period 4.
Notes
The course starts with a review of basic statistical concepts and linear regression. But the course will focus mostly on classification and clustering based on non-regression techniques such as tree-based approaches, support vector machines, and unsupervised learning (e.g., hierarchical clustering). In the problem sets, tutorials, and class projects we will examine applications in: healthcare, energy, transportation, education, crime, and online markets. This course is intended for first and second year Masters students. Ph.D. students with an interest in non-regression based quantitative methods may also find this course useful. In covering the material from the assigned textbook and pathbreaking journal articles, this course will emphasize both on formulaic and conceptual understanding of the discussed methods. As necessary, the instructor will routinely draw on material from outside the textbook for driving conceptual clarity. To that end, reading for each week will include 3-5 papers of broad interest and relevance.