Course Offerings
I310D- Introduction to Human-Centered Data Science is a survey course that introduces students to the theory and practice of data science through a human-centered lens, with emphasis on how design choices influence algorithmic results. Students will gain comfort and facility with fundamental principles of data science including (a) Programming for Data Science with Python (b) Data Engineering (c) Database Systems (d) Machine Learning and (e) Human centered aspects such as privacy, bias, fairness, transparency, accountability, reproducibility, interpretability, and societal implications. Each week’s class divided into two segments: (a) Theory and Methods, a concise description of theoretical concept in data science, and (b) Tutorial, a hands-on session on applying the theory just discussed to a real-world task on publicly available data. We will use Python for programming and cover Python basics in the beginning of the course. For modules related to databases, we will use PostGre SQL.
No description provided.
This course will cover relevant fundamental concepts in machine learning (ML) and how they are used to solve real-world problems. Students will learn the theory behind a variety of machine learning tools and practice applying the tools to real-world data such as numerical data, textual data (natural language processing), and visual data (computer vision). Each class is divided into two segments: (a) Theory and Methods, a concise description of an ML concept, and (b) Lab Tutorial, a hands-on session on applying the theory just discussed to a real-world task on publicly available data. We will use Python for programming. By the end of the course, the goals for the students are to: 1. Develop a sense of where to apply machine learning and where not to, and which ML algorithm to use 2. Understand the process of garnering and preprocessing a variety of “big” real-world data, to be used to train ML systems 3. Characterize the process to train machine learning algorithms and evaluate their performance 4. Develop programming skills to code in Python and use modern ML and scientific computing libraries like SciPy and scikit-learn 5. Propose a novel product/research-focused idea (this will be an iterative process), design and execute experiments, and present the findings and demos to a suitable audience (in this case, the class).
This course offers students in Information Science a comprehensive exploration into the theories, techniques, and tools of data visualization. It is designed to equip students with the skills to effectively communicate complex information visually, enabling data analysis and decision-making. Through a combination of lectures, hands-on projects, and case studies, students will learn how to design and implement effective and aesthetically appealing data visualizations for a variety of data types and audiences. Upon successful completion of this course, students will be able to: • Understand the principles and psychology of visual perception and how they influence data visualization. • Critically evaluate the effectiveness of different data visualization techniques for varying data types and user needs. • Master the use of leading data visualization tools and libraries such as D3.js, or Tableau. • Develop interactive dashboards and reports that effectively communicate findings to both technical and non-technical audiences. • Apply design principles to create visually appealing, accurate, and accessible data visualizations.
The class explores the principles of relational database design, and SQL as a query language in depth.
This course lays the foundation for data science education targeting health informatics students interested in learning more broadly about biomedical informatics. No previous coding experience is required. The students will be introduced to basic concepts and tools for data analysis. The focus is on hands-on practice and enjoyable learning. The course will use python as the programming language, and Jupyter Notebooks as the development environment (our “home base”) for the examples, tutorials, and assignments. We use Jupyterlab Notebooks because they are both the industry standard and a nice way to load, visualize, and analyze data and describe our findings in one environment. We will also learn GitHub to document changes and backup our work and, eventually, for use as a collaboration tool. Hands-on data analysis, final projects, and associated presentations will be mandatory for the completion of the course. The outcome for the class is that each student will have a GitHub repository with all of their work (Jupyter notebooks, data, etc.), including a final project that will be presented to the class. Specific topics to be covered include GitHub, Linux/Unix File system, Jupyter Notebooks, Python Programming, and Data Visualization.
Principles and practices in Data Engineering. Emphasis on the data engineering lifecycle and how to build data pipelines to collect, transform, analyze and visualize data from operational systems. This is a hands-on and highly interactive course. Students will learn analytical data modeling techniques for organizing and querying data. They will learn how to transform data into dimensional models, how to build data products, and how to visualize the data. We will also examine the various roles data engineers can have in an organization and career paths for data professionals
Introduction to the emerging field of Explainable Artificial Intelligence (XAI) from the perspectives of a developer and end-user. Students will gain hands-on experience with some of the most commonly used explainability techniques and algorithms.
This course offers an introduction to Fine-Tuning Open-Source Large Language Models (LLMs) through project-based applications and real-world examples. The course will begin with a foundational understanding of Natural Language Processing (NLP), focusing on Text Preprocessing techniques such as Tokenization and Vectorization. A basic overview of Large Language Models will be provided, covering the fundamental structure and architecture of commonly used Open-Source Frameworks. The course will then focus on three key methods for fine-tuning LLMs: Self-Supervised, Supervised and Reinforcement Learning. Each method will be explored through both theoretical explanations and practical group-based projects, applying these concepts to real-world examples. Students will engage in hands-on projects to strengthen their understanding of how to customize and optimize LLMs for specific tasks or domains.
Practical skills and understandings required to effectively work with open source software and understand the projects that build them. Includes git-based collaboration as well as conceptual understanding of licenses, security, technical and social processes in open source development. Class projects involve working with digital trace data from open source repositories.
Leveraging Text Mining, Natural Language Processing, and Computational Linguistics to address real-world textual data challenges, including document processing, keyword extraction, question answering, translation, summarization, sentiment analysis, search, recommendation, and information extraction. Each week, classes include (a) Theory and Methods for NLP concepts and (b) Lab Tutorials for practical application with Python on multilingual text datasets.
This course starts by discussing broad landscape of epistemological and theoretical perspectives and styles of reasoning and by situating in it quantitative research. It introduces you to the foundational concepts in quantitative research methods, such as causality, conceptualization, operationalization, measurement and sampling. It presents experimental design, survey design, and basic descriptive and inferential (frequentist) statistics, as well as a brief introduction to Bayesian inference and statistics.
