INF 397: Research in Information Studies: Introduction to Machine Learning / Statistical Analysis and Learning

Spring Term 2024

Mode: In-Person

Instructor

Program: MSIS/PhD

Unique ID

27864

Day	Start	End	Building	Room
Wednesday	9:00 AM	12:00 PM	SRH	3.122

Catalog Description

Large datasets are increasingly becoming available across many sectors such as healthcare, energy, and online markets. This course focuses on methods that allow “learning” from such datasets to uncover underlying relationships and patterns in the data, with a focus on predictive performance of various models that can be built to represent the underlying function generating the data. The course starts with a review of basic statistical concepts and linear regression. But the course will focus mostly on introducing students to regression, classification, and clustering techniques beyond linear regression, such as tree-based approaches, support vector machines, neural networks/deep learning, LLMs, and unsupervised learning. This course is intended for first- and second-year Masters students. Ph.D. students with an interest machine learning models may also find this course useful.

In covering the material from the assigned textbook and complementary selected readings (e.g. journal articles), this course will emphasize both on formulaic and conceptual understanding of the discussed methods. As necessary, the instructor will draw on material from outside the textbook for driving conceptual clarity and showcasing the application of the methods learned to a broad range of practical problems.

Prerequisites

Graduate standing.

Basic grasp of statistics and linear regression would be helpful. However, all relevant concepts will be reviewed during the course. Problem sets will include applied problems, including some from the textbook, that will require programming. All coding for this course will be in R or Python. In the beginning, the instructor will point students to preparatory resources in R and Python to provide the necessary background and toolsets in R/Python that will be necessary in solving the problem sets.

Restrictions

Restricted to graduate students in the School of Information through registration period 3. Outside students may be permitted to join the waitlist beginning in period 4.

Notes

The course starts with a review of basic statistical concepts and linear regression. But the course will focus mostly on classification and clustering based on non-regression techniques such as tree-based approaches, support vector machines, and unsupervised learning (e.g., hierarchical clustering). In the problem sets, tutorials, and class projects we will examine applications in: healthcare, energy, transportation, education, crime, and online markets. This course is intended for first and second year Masters students. Ph.D. students with an interest in non-regression based quantitative methods may also find this course useful. In covering the material from the assigned textbook and pathbreaking journal articles, this course will emphasize both on formulaic and conceptual understanding of the discussed methods. As necessary, the instructor will routinely draw on material from outside the textbook for driving conceptual clarity. To that end, reading for each week will include 3-5 papers of broad interest and relevance.

View More Information & Past Syllabi