Course Offerings
Introduction to the theory and practice of data science through a human-centered lens, with emphasis on how design choices influence algorithmic results. Students will gain comfort and facility with fundamental principles of data science including (a) Programming for Data Science with Python (b) Data Engineering (c) Database Systems (d) Machine Learning and (e) Human centered aspects such as privacy, bias, fairness, transparency, accountability, reproducibility, interpretability, and societal implications. Each week’s class is divided into two segments: (a) Theory and Methods, a concise description of theoretical concept in data science, and (b) Tutorial, a hands-on session on applying the theory just discussed to a real-world task on publicly available data. We will use Python for programming and cover Python basics in the beginning of the course. For modules related to databases, we will use PostGre SQL.
An introduction to sociotechnical perspectives on information systems, their effects, and how we intervene to make them better.
No description provided.
The class explores the principles of relational database design, and SQL as a query language in depth.
Principles and practices in Data Engineering. Emphasis on the data engineering lifecycle and how to build data pipelines to collect, transform, analyze and visualize data from operational systems. This is a hands-on and highly interactive course. Students will learn analytical data modeling techniques for organizing and querying data. They will learn how to transform data into dimensional models, how to build data products, and how to visualize the data. We will also examine the various roles data engineers can have in an organization and career paths for data professionals
This course will cover relevant fundamental concepts in machine learning (ML) and how they are used to solve real-world problems. Students will learn the theory behind a variety of machine learning tools and practice applying the tools to real-world data such as numerical data, textual data (natural language processing), and visual data (computer vision). Each class is divided into two segments: (a) Theory and Methods, a concise description of an ML concept, and (b) Lab Tutorial, a hands-on session on applying the theory just discussed to a real-world task on publicly available data. We will use Python for programming. By the end of the course, the goals for the students are to: 1. Develop a sense of where to apply machine learning and where not to, and which ML algorithm to use 2. Understand the process of garnering and preprocessing a variety of “big” real-world data, to be used to train ML systems 3. Characterize the process to train machine learning algorithms and evaluate their performance 4. Develop programming skills to code in Python and use modern ML and scientific computing libraries like SciPy and scikit-learn 5. Propose a novel product/research-focused idea (this will be an iterative process), design and execute experiments, and present the findings and demos to a suitable audience (in this case, the class).
Practical skills and understandings required to effectively work with open source software and understand the projects that build them. Includes git-based collaboration as well as conceptual understanding of licenses, security, technical and social processes in open source development. Class projects involve working with digital trace data from open source repositories.
This course offers students in Information Science a comprehensive exploration into the theories, techniques, and tools of data visualization. It is designed to equip students with the skills to effectively communicate complex information visually, enabling data analysis and decision-making. Through a combination of lectures, hands-on projects, and case studies, students will learn how to design and implement effective and aesthetically appealing data visualizations for a variety of data types and audiences. Upon successful completion of this course, students will be able to: • Understand the principles and psychology of visual perception and how they influence data visualization. • Critically evaluate the effectiveness of different data visualization techniques for varying data types and user needs. • Master the use of leading data visualization tools and libraries such as D3.js, or Tableau. • Develop interactive dashboards and reports that effectively communicate findings to both technical and non-technical audiences. • Apply design principles to create visually appealing, accurate, and accessible data visualizations.
Introduction to the emerging field of Explainable Artificial Intelligence (XAI) from the perspectives of a developer and end-user. Students will gain hands-on experience with some of the most commonly used explainability techniques and algorithms.
Leveraging Text Mining, Natural Language Processing, and Computational Linguistics to address real-world textual data challenges, including document processing, keyword extraction, question answering, translation, summarization, sentiment analysis, search, recommendation, and information extraction. Each week, classes include (a) Theory and Methods for NLP concepts and (b) Lab Tutorials for practical application with Python on multilingual text datasets.
This course lays the foundation for data science education targeting health informatics students interested in learning more broadly about biomedical informatics. No previous coding experience is required. The students will be introduced to basic concepts and tools for data analysis. The focus is on hands-on practice and enjoyable learning. The course will use python as the programming language, and Jupyter Notebooks as the development environment (our “home base”) for the examples, tutorials, and assignments. We use Jupyterlab Notebooks because they are both the industry standard and a nice way to load, visualize, and analyze data and describe our findings in one environment. We will also learn GitHub to document changes and backup our work and, eventually, for use as a collaboration tool. Hands-on data analysis, final projects, and associated presentations will be mandatory for the completion of the course. The outcome for the class is that each student will have a GitHub repository with all of their work (Jupyter notebooks, data, etc.), including a final project that will be presented to the class. Specific topics to be covered include GitHub, Linux/Unix File system, Jupyter Notebooks, Python Programming, and Data Visualization.
This course offers an introduction to Fine-Tuning Open-Source Large Language Models (LLMs) through project-based applications and real-world examples. The course will begin with a foundational understanding of Natural Language Processing (NLP), focusing on Text Preprocessing techniques such as Tokenization and Vectorization. A basic overview of Large Language Models will be provided, covering the fundamental structure and architecture of commonly used Open-Source Frameworks. The course will then focus on three key methods for fine-tuning LLMs: Self-Supervised, Supervised and Reinforcement Learning. Each method will be explored through both theoretical explanations and practical group-based projects, applying these concepts to real-world examples. Students will engage in hands-on projects to strengthen their understanding of how to customize and optimize LLMs for specific tasks or domains.
No description provided.
Online communities are important to our cultural, social, and economic lives and especially to how we find and share information. Yet they also threaten our well-being and may undermine critical social institutions as well as the integrity of public discourse. This course is an interdisciplinary inquiry that seeks to understand online communities. It covers the history of online communities from their origins in the pre-Internet to the rise of social media platforms and contemporary challenges and also the social, psychological, and human-computer interaction research that both explains the practical barriers to building an online community and motivates technical and organizational designs that aim to overcome them.
Explore common data collection, management, and sharing practices around information technology and emerging technologies such as AI. Students will gain hands on experiences with collecting, analyzing, and managing user data in ethical and responsible manners. Students will design data-driven systems that are centered around user consent, transparency, and social responsibilities.
Critical exploration of the intersection between digital technologies and information access in emerging economies. Investigate the historical, socio-economic, and ethical dimensions of digital adoption in the Global South, analyzing its impact on governance, economies, cultures, and societal dynamics. Emphasis on critical thinking, ethical considerations, and collaborative approaches to address challenges such as the digital divide(s), data sovereignty, and technology-driven inequality. Through case studies and practical exercises, students will develop skills in digital research, global cultures, policy analysis, and technology innovation with a focus on promoting inclusive and sustainable digital transformation in Global South contexts. Also offered as I 320J.
Practical skills and understandings required to effectively work with open source software and understand the projects that build them. Includes git-based collaboration as well as conceptual understanding of licenses, security, technical and social processes in open source development. Class projects involve working with digital trace data from open source repositories. Also offered as Informatics 320D.
This course examines disability beyond digital accessibility (i.e., web accessibility, user interface design) and focuses on disability from an organizational and socio-technical point of view. Students will learn about the legislation and policies impacting accessibility, the models that shape our perceptions of disability, and review case studies of disability in several contexts. In addition to the broader types of disabilities, we will consider other forms of disabilities (permanent, situational, temporary). Students will engage in class discussions, small group activities, homework assignments, and give oral presentations. Students will be equipped with the knowledge and skills to apply methods and models of accessibility in the workplace in various fields, including software design, data science, AI, and library science.
This class explores how to make arguments about and through design. The first half focuses on values, criticism, ethics, and analysis of technology, the latter portion aims to help a soon-to-graduate technologist envision positive social impact in a mission-driven enterprise. Students will practice synthesizing ethical tech considerations – as they will have to do for the rest of their careers – and combining this with an organizational mindset. Through exercises, role-playing, discussions, guest lectures from activist technologists, and wide-ranging readings, students will practice connecting broader implications of their designs with technical choices. Design for Social Impact seeks to arm students with diverse ways of reflecting on their authorial relationship to technology, drawing from art and design to political science and anthropology. Course participants will be encouraged to focus on areas of personal interest, enumerating the social, political, and economic parameters of particular technical systems: parameters that are as important as power consumption, usability, or efficiency.
Effective application of social and technical methods of analysis to specific existing systems with inseparable technical and social components to enable improvement. Covers techniques such as modeling, interviewing, observation, trace analysis, and benchmarking.