Spring 2025

I 320D Topics in Human-Centered Data Science: Fine Tuning Open-Source Large Language Models

Unique ID: 28189

   Tues
   Thurs

06:30 PM - 08:00 PM  CBA 4.344

DESCRIPTION

This course offers an introduction to Fine-Tuning Open-Source Large Language Models (LLMs) through project-based applications and real-world examples. The course will begin with a foundational understanding of Natural Language Processing (NLP), focusing on Text Preprocessing techniques such as Tokenization and Vectorization. A basic overview of Large Language Models will be provided, covering the fundamental structure and architecture of commonly used Open-Source Frameworks. The course will then focus on three key methods for fine-tuning LLMs: Self-Supervised, Supervised and Reinforcement Learning. Each method will be explored through both theoretical explanations and practical group-based projects, applying these concepts to real-world examples. Students will engage in hands-on projects to strengthen their understanding of how to customize and optimize LLMs for specific tasks or domains.

PREREQUISITES

Upper-division standing; Informatics 310D and Informatics 304 (or one of the following approved substitutions: C S 303E, C S 312, C S 312H, C S 313E).

RESTRICTIONS

Restricted to undergraduate Informatics majors through registration period 1. Informatics minors may add classes and join waitlists beginning in period 2. Outside students will be permitted to join our waitlists beginning with period 3.