Course Offerings
Introduction to the theory and practice of data science through a human-centered lens, with emphasis on how design choices influence algorithmic results. Students will gain comfort and facility with fundamental principles of data science including (a) Programming for Data Science with Python (b) Data Engineering (c) Database Systems (d) Machine Learning and (e) Human centered aspects such as privacy, bias, fairness, transparency, accountability, reproducibility, interpretability, and societal implications. Each week’s class is divided into two segments: (a) Theory and Methods, a concise description of theoretical concept in data science, and (b) Tutorial, a hands-on session on applying the theory just discussed to a real-world task on publicly available data. We will use Python for programming and cover Python basics in the beginning of the course. For modules related to databases, we will use PostGre SQL.
No description provided.
The class explores the principles of relational database design, and SQL as a query language in depth.
Principles and practices in Data Engineering. Emphasis on the data engineering lifecycle and how to build data pipelines to collect, transform, analyze and visualize data from operational systems. This is a hands-on and highly interactive course. Students will learn analytical data modeling techniques for organizing and querying data. They will learn how to transform data into dimensional models, how to build data products, and how to visualize the data. We will also examine the various roles data engineers can have in an organization and career paths for data professionals
This course will cover relevant fundamental concepts in machine learning (ML) and how they are used to solve real-world problems. Students will learn the theory behind a variety of machine learning tools and practice applying the tools to real-world data such as numerical data, textual data (natural language processing), and visual data (computer vision). Each class is divided into two segments: (a) Theory and Methods, a concise description of an ML concept, and (b) Lab Tutorial, a hands-on session on applying the theory just discussed to a real-world task on publicly available data. We will use Python for programming. By the end of the course, the goals for the students are to: 1. Develop a sense of where to apply machine learning and where not to, and which ML algorithm to use 2. Understand the process of garnering and preprocessing a variety of “big” real-world data, to be used to train ML systems 3. Characterize the process to train machine learning algorithms and evaluate their performance 4. Develop programming skills to code in Python and use modern ML and scientific computing libraries like SciPy and scikit-learn 5. Propose a novel product/research-focused idea (this will be an iterative process), design and execute experiments, and present the findings and demos to a suitable audience (in this case, the class).
Practical skills and understandings required to effectively work with open source software and understand the projects that build them. Includes git-based collaboration as well as conceptual understanding of licenses, security, technical and social processes in open source development. Class projects involve working with digital trace data from open source repositories.
This course offers students in Information Science a comprehensive exploration into the theories, techniques, and tools of data visualization. It is designed to equip students with the skills to effectively communicate complex information visually, enabling data analysis and decision-making. Through a combination of lectures, hands-on projects, and case studies, students will learn how to design and implement effective and aesthetically appealing data visualizations for a variety of data types and audiences. Upon successful completion of this course, students will be able to: • Understand the principles and psychology of visual perception and how they influence data visualization. • Critically evaluate the effectiveness of different data visualization techniques for varying data types and user needs. • Master the use of leading data visualization tools and libraries such as D3.js, or Tableau. • Develop interactive dashboards and reports that effectively communicate findings to both technical and non-technical audiences. • Apply design principles to create visually appealing, accurate, and accessible data visualizations.
Introduction to the emerging field of Explainable Artificial Intelligence (XAI) from the perspectives of a developer and end-user. Students will gain hands-on experience with some of the most commonly used explainability techniques and algorithms.
Leveraging Text Mining, Natural Language Processing, and Computational Linguistics to address real-world textual data challenges, including document processing, keyword extraction, question answering, translation, summarization, sentiment analysis, search, recommendation, and information extraction. Each week, classes include (a) Theory and Methods for NLP concepts and (b) Lab Tutorials for practical application with Python on multilingual text datasets.
This course lays the foundation for data science education targeting health informatics students interested in learning more broadly about biomedical informatics. No previous coding experience is required. The students will be introduced to basic concepts and tools for data analysis. The focus is on hands-on practice and enjoyable learning. The course will use python as the programming language, and Jupyter Notebooks as the development environment (our “home base”) for the examples, tutorials, and assignments. We use Jupyterlab Notebooks because they are both the industry standard and a nice way to load, visualize, and analyze data and describe our findings in one environment. We will also learn GitHub to document changes and backup our work and, eventually, for use as a collaboration tool. Hands-on data analysis, final projects, and associated presentations will be mandatory for the completion of the course. The outcome for the class is that each student will have a GitHub repository with all of their work (Jupyter notebooks, data, etc.), including a final project that will be presented to the class. Specific topics to be covered include GitHub, Linux/Unix File system, Jupyter Notebooks, Python Programming, and Data Visualization.
This course offers an introduction to Fine-Tuning Open-Source Large Language Models (LLMs) through project-based applications and real-world examples. The course will begin with a foundational understanding of Natural Language Processing (NLP), focusing on Text Preprocessing techniques such as Tokenization and Vectorization. A basic overview of Large Language Models will be provided, covering the fundamental structure and architecture of commonly used Open-Source Frameworks. The course will then focus on three key methods for fine-tuning LLMs: Self-Supervised, Supervised and Reinforcement Learning. Each method will be explored through both theoretical explanations and practical group-based projects, applying these concepts to real-world examples. Students will engage in hands-on projects to strengthen their understanding of how to customize and optimize LLMs for specific tasks or domains.
INF 380P: Introduction to Programming
The class focuses on developing problem solving skills using Python as a programming language. Starting from procedural function development, we also explore object-oriented techniques, and discuss simple data structures that are often used in software development. The students usually do a few programming assignments, take a midterm, and submit a final project.
INF 385M: Database Management
Database is the foundation of Data Science. It provides the unique design to store, retrieve, and manage data. Data become the essential gas to power the generative AI. How to model data, encode context, enforce business rules, and achieve efficiency are critical for database design. This course provides the introductory understanding of relational database design with the focus on three parts. The first part is centered around the database design lifecycle by introducing business rules, ER diagram, normalization, and UML chart. The second part talks about database query language SQL by explaining concepts and providing examples. The third part gives you the forward introduction of XML database which is the commonly used NoSQL database. The learning content will be delivered in the variety of exercises including lectures, tutorials, class activities, individual assignments, group assignments, and group projects. This course empathizes peer learning, hands-on practices, forward exploring, and risk taking.
Infrastructure is all around us, even (or perhaps especially) where we do not actively consider or account for it. In this course, students will learn how knowledge infrastructures such as repositories, classification systems, databases, networks, standards, and/or metadata both shape and are shaped by governmental policy, institutional decision making, technical advances, and professional and personal value systems. We consider how infrastructure matters in professional, personal, and political life, and employ infrastructure as a lens to evaluate and understand the legal, ethical, and policy consequences of knowledge work, data science, and information management. In this course, students will employ an infrastructural perspective to evaluate programs, systems, policies, and/or organizations. We will explore the consequences and societal impact of knowledge work at both global and local scales, and consider how infrastructure might be built or refined to support societal or organizational goals such as social justice, privacy, innovation, health, or security. This is primarily a discussion-oriented course, with assessment primarily coming through a multi-stage, semester-long, project oriented around a program evaluation.
Ethnographic research has found application and acceptance across various academic disciplines as well as industries. This course aims to introduce fundamental tenets of ethnographic methodology for investigating sociotechnical systems. Its foundation rests on interdisciplinary perspectives and anthropological insights, while simultaneously aligning with contemporary advancements such as design and speculative ethnography. The role of the future has perpetually held a central position in the utilization and shaping of technologies and information systems. A recurring narrative involves positioning a specific technology or system as "revolutionary" or "the future of" a certain domain. Adopting an ethnographic approach, this course seeks to critically examine sociotechnical imaginaries. Its objective is to glean insights from diverse communities, offering guidance in the construction of futures that are more inclusive, equitable, and diverse.
Ethnographic research has found application and acceptance across various academic disciplines as well as industries. This course aims to introduce fundamental tenets of ethnographic methodology for investigating sociotechnical systems. Its foundation rests on interdisciplinary perspectives and anthropological insights, while simultaneously aligning with contemporary advancements such as design and speculative ethnography. The role of the future has perpetually held a central position in the utilization and shaping of technologies and information systems. A recurring narrative involves positioning a specific technology or system as "revolutionary" or "the future of" a certain domain. Adopting an ethnographic approach, this course seeks to critically examine sociotechnical imaginaries. Its objective is to glean insights from diverse communities, offering guidance in the construction of futures that are more inclusive, equitable, and diverse.
*THIS TOPIC WILL NO LONGER BE OFFERED AFTER SPRING 2025In this course, we will work to understand and address the challenges of misinformation, disinformation, and strategic manipulation in online environments. First, we will work to develop a deep understanding of the problem space. We will read and discuss existing research (both historical and contemporary) on how and why misinformation and disinformation spread. Next, we will explore the process, both personal and interpersonal, by which these issues can be approached and addressed in our own lives. This will involve reflecting on our own presuppositions, beliefs, and biases about information; and doing a project in which we apply the principles of Human-Centered Design to investigate different design directions for addressing misleading information. Students will gain important contextual knowledge and hands-on design experience that they can take into future professional domains (from education to policy to technology), where they can contribute to building more trustworthy information systems.
Data storytelling is more than sharing data—at its most simple, it’s about designing charts and tables that make sense to the people who will be using them and help those people make better, faster decisions. While making a chart is as easy as a few clicks, doing it well requires much more. There is a science to how our eyes and minds process information as well as an art to making good graphic design choices. This comes together in an effective data presentation when the work is readable, usable, and above all actionable—not just aesthetically pleasing (though we’ll certainly address that too). As information professionals, we are well-positioned to understand and design for the needs of our users, to interrogate our data sources thoughtfully, and to ask future-thinking questions. This course will also draw on elements from cognitive psychology, user experience, data journalism, graphic design, business, and more. This multidisciplinary approach will take us on a grand tour that will touch on many aspects of data analysis and will serve as an excellent introduction to other data-oriented courses in the iSchool master’s program.
Data storytelling is more than sharing data—at its most simple, it’s about designing charts and tables that make sense to the people who will be using them and help those people make better, faster decisions. While making a chart is as easy as a few clicks, doing it well requires much more. There is a science to how our eyes and minds process information as well as an art to making good graphic design choices. This comes together in an effective data presentation when the work is readable, usable, and above all actionable—not just aesthetically pleasing (though we’ll certainly address that too). As information professionals, we are well-positioned to understand and design for the needs of our users, to interrogate our data sources thoughtfully, and to ask future-thinking questions. This course will also draw on elements from cognitive psychology, user experience, data journalism, graphic design, business, and more. This multidisciplinary approach will take us on a grand tour that will touch on many aspects of data analysis and will serve as an excellent introduction to other data-oriented courses in the iSchool master’s program.
Processes, techniques, and technologies that generate inscriptions (ready-to-take data), especially from or about people(s) or culture(s). Contexts, consequences, and history of datafication practices. Purposive intervention with datafication processes, practices, and artifacts.
Processes, techniques, and technologies that generate inscriptions (ready-to-take data), especially from or about people(s) or culture(s). Contexts, consequences, and history of datafication practices. Purposive intervention with datafication processes, practices, and artifacts.
INF 385T.12: Special Topics in Information Science: Ethics of AI
Artificial intelligence (AI) is both a product of and a major influence on society. As AI plays an increasingly important role in society, it is critical to understand both the ethical factors that influence the design of AI and the ethical dimensions of the impacts of AI in society. The goal of this course is to prepare students for the important ethical responsibilities that come with developing systems that may have consequential, even life-and-death, consequences. Students first learn about both the history of ethics and the history of AI, to understand the basis for contemporary, global ethical perspectives (including non-Western and feminist perspectives) and the factors that have influenced the design, development, and deployment of AI-based systems. Students then explore the societal dimensions of the ethics and values of AI. Finally, students explore the technical dimensions of the ethics and values of AI, including design considerations such as fairness, accountability, transparency, power, and agency. Students who perform well in this class will be positioned to take on a leadership role within their organizations and will be able to help guide and steer the design, development, and deployment of AI-based systems in ways that benefit users, other stakeholders, their organizations, and society. The knowledge and skill gained through this course will benefit students throughout their careers, and society as a whole will benefit from ensuring that studenrs are prepared to consider the important ethical dimensions of their work.
This course examines the U.S. communication policy in light of domestic and international structural, economic and technological changes. We will investigate how notions of control, access and expression have changed during the 20th and the 21st centuries, examining communication policies and regulation against a backdrop of technological innovation. The definitions and controversies around what constitutes the public interest intersect with policies for specific media systems including broadcasting, cablecasting, the Internet and social media, among others. The cultural ramifications of communication systems in terms of their impacts on people and on speech are a related domain we will address. At the current moment, issues around privacy, large tech companies and their role in contemporary life, the limits and authority of regulation, and of course social media,AI and ‘big data’ dominate many political and research agendas. Our goal will be to understand the backgrounds and foundations that bring us to these concerns and to frame them in critical ways.