Topics in Human-Centered Data Science: Fine Tuning Open-Source Large Language Models

Course Areas

Human-Centered Data Science

Catalog Description

Hands-on experience in data preparation, model fine tuning, and performance evaluation for popular open-source frameworks.

Instructor Description

This course offers an introduction to Fine-Tuning Open-Source Large Language Models (LLMs) through project-based applications and real-world examples. The course will begin with a foundational understanding of Natural Language Processing (NLP), focusing on Text Preprocessing techniques such as Tokenization and Vectorization. A basic overview of Large Language Models will be provided, covering the fundamental structure and architecture of commonly used Open-Source Frameworks. The course will then focus on three key methods for fine-tuning LLMs: Self-Supervised, Supervised and Reinforcement Learning. Each method will be explored through both theoretical explanations and practical group-based projects, applying these concepts to real-world examples. Students will engage in hands-on projects to strengthen their understanding of how to customize and optimize LLMs for specific tasks or domains.

Prerequisites

Upper-division standing; Informatics 310D and Informatics 304 (or one of the following approved substitutions: C S 303E, C S 312, C S 312H, C S 313E).

Restrictions

Generally restricted to undergradute Informatics majors through registration period 1 and extended to Informatics minors in period 2. Outside students – including iSchool graduate students – will only be accepted during period 3.

Current and Upcoming Classes for this Course

No upcoming classes are scheduled for this course.

Past Classes for this Course

Class Name	Semester	Day(s)	Start Time(s)	End Time(s)	Building	Room
I 320D: Topics in Human-Centered Data Science: Fine Tuning Open-Source Large Language Models Louis Gutierrez Syllabus	Spring Term 2025	Tuesday Thursday	06:30 PM 06:30 PM	08:00 PM 08:00 PM	CBA CBA	4.344 4.344