Course Offerings
Introduction to the theory and practice of data science through a human-centered lens, with emphasis on how design choices influence algorithmic results. Students will gain comfort and facility with fundamental principles of data science including (a) Programming for Data Science with Python (b) Data Engineering (c) Database Systems (d) Machine Learning and (e) Human centered aspects such as privacy, bias, fairness, transparency, accountability, reproducibility, interpretability, and societal implications. Each week’s class is divided into two segments: (a) Theory and Methods, a concise description of theoretical concept in data science, and (b) Tutorial, a hands-on session on applying the theory just discussed to a real-world task on publicly available data. We will use Python for programming and cover Python basics in the beginning of the course. For modules related to databases, we will use PostGre SQL.
No description provided.
The class explores the principles of relational database design, and SQL as a query language in depth.
Principles and practices in Data Engineering. Emphasis on the data engineering lifecycle and how to build data pipelines to collect, transform, analyze and visualize data from operational systems. This is a hands-on and highly interactive course. Students will learn analytical data modeling techniques for organizing and querying data. They will learn how to transform data into dimensional models, how to build data products, and how to visualize the data. We will also examine the various roles data engineers can have in an organization and career paths for data professionals
This course will cover relevant fundamental concepts in machine learning (ML) and how they are used to solve real-world problems. Students will learn the theory behind a variety of machine learning tools and practice applying the tools to real-world data such as numerical data, textual data (natural language processing), and visual data (computer vision). Each class is divided into two segments: (a) Theory and Methods, a concise description of an ML concept, and (b) Lab Tutorial, a hands-on session on applying the theory just discussed to a real-world task on publicly available data. We will use Python for programming. By the end of the course, the goals for the students are to: 1. Develop a sense of where to apply machine learning and where not to, and which ML algorithm to use 2. Understand the process of garnering and preprocessing a variety of “big” real-world data, to be used to train ML systems 3. Characterize the process to train machine learning algorithms and evaluate their performance 4. Develop programming skills to code in Python and use modern ML and scientific computing libraries like SciPy and scikit-learn 5. Propose a novel product/research-focused idea (this will be an iterative process), design and execute experiments, and present the findings and demos to a suitable audience (in this case, the class).
Practical skills and understandings required to effectively work with open source software and understand the projects that build them. Includes git-based collaboration as well as conceptual understanding of licenses, security, technical and social processes in open source development. Class projects involve working with digital trace data from open source repositories.
This course offers students in Information Science a comprehensive exploration into the theories, techniques, and tools of data visualization. It is designed to equip students with the skills to effectively communicate complex information visually, enabling data analysis and decision-making. Through a combination of lectures, hands-on projects, and case studies, students will learn how to design and implement effective and aesthetically appealing data visualizations for a variety of data types and audiences. Upon successful completion of this course, students will be able to: • Understand the principles and psychology of visual perception and how they influence data visualization. • Critically evaluate the effectiveness of different data visualization techniques for varying data types and user needs. • Master the use of leading data visualization tools and libraries such as D3.js, or Tableau. • Develop interactive dashboards and reports that effectively communicate findings to both technical and non-technical audiences. • Apply design principles to create visually appealing, accurate, and accessible data visualizations.
Introduction to the emerging field of Explainable Artificial Intelligence (XAI) from the perspectives of a developer and end-user. Students will gain hands-on experience with some of the most commonly used explainability techniques and algorithms.
Leveraging Text Mining, Natural Language Processing, and Computational Linguistics to address real-world textual data challenges, including document processing, keyword extraction, question answering, translation, summarization, sentiment analysis, search, recommendation, and information extraction. Each week, classes include (a) Theory and Methods for NLP concepts and (b) Lab Tutorials for practical application with Python on multilingual text datasets.
This course lays the foundation for data science education targeting health informatics students interested in learning more broadly about biomedical informatics. No previous coding experience is required. The students will be introduced to basic concepts and tools for data analysis. The focus is on hands-on practice and enjoyable learning. The course will use python as the programming language, and Jupyter Notebooks as the development environment (our “home base”) for the examples, tutorials, and assignments. We use Jupyterlab Notebooks because they are both the industry standard and a nice way to load, visualize, and analyze data and describe our findings in one environment. We will also learn GitHub to document changes and backup our work and, eventually, for use as a collaboration tool. Hands-on data analysis, final projects, and associated presentations will be mandatory for the completion of the course. The outcome for the class is that each student will have a GitHub repository with all of their work (Jupyter notebooks, data, etc.), including a final project that will be presented to the class. Specific topics to be covered include GitHub, Linux/Unix File system, Jupyter Notebooks, Python Programming, and Data Visualization.
This course offers an introduction to Fine-Tuning Open-Source Large Language Models (LLMs) through project-based applications and real-world examples. The course will begin with a foundational understanding of Natural Language Processing (NLP), focusing on Text Preprocessing techniques such as Tokenization and Vectorization. A basic overview of Large Language Models will be provided, covering the fundamental structure and architecture of commonly used Open-Source Frameworks. The course will then focus on three key methods for fine-tuning LLMs: Self-Supervised, Supervised and Reinforcement Learning. Each method will be explored through both theoretical explanations and practical group-based projects, applying these concepts to real-world examples. Students will engage in hands-on projects to strengthen their understanding of how to customize and optimize LLMs for specific tasks or domains.
INF 385E: Information Architecture and Design
This course explores the fundamental principles and practical applications of Information Architecture (IA). Drawing from the seminal work "Information Architecture: For the Web and Beyond" by Louis Rosenfeld, Peter Morville, and Jorge Arango, students will delve into the essential concepts, methodologies, and best practices shaping the organization and presentation of information in digital environments. Simply, this course addresses how to make content organized and findable based on human understanding. Throughout the course, students will examine the critical role of IA in enhancing user experience, facilitating navigation, and optimizing content discoverability. Topics covered include information organization, navigation design, metadata implementation, taxonomy development, and user-centered design principles. Through a combination of theoretical discussions, case studies, hands-on exercises, and a real project with a real client and real world constraints, students will gain proficiency in designing effective IA solutions tailored to diverse user needs and contexts. Emphasis will be placed on understanding user behavior, conducting user research, and iteratively refining IA structures to align with evolving user requirements and organizational goals. Course Objectives: Gain a comprehensive understanding of Information Architecture principles and methodologies. Learn how to analyze and evaluate existing IA structures in digital environments. Develop proficiency in designing and implementing effective IA solutions for websites and digital products. Explore techniques for conducting user research and applying user-centered design principles to IA. Understand the role of IA in enhancing usability, findability, and overall user experience. Acquire practical skills in wireframing, prototyping, and usability testing within an IA context. Explore emerging trends and technologies shaping the field of Information Architecture.
INF 385P: Usability
This course will give students a foundational introduction to user experience (also known as UX, CX, HCI) and introduce some of the core UX research methods in use today, as well as applying these methods to a product to create a final presentation that can hopefully be used in their portfolio/job seeking adventures. Accordingly, the class will cover 5 major areas: 1. Have an in-depth understanding of some primary UX methods relevant to product development (e.g. Heuristic evaluation, Moderated User testing, UX Benchmarking). 2. Understand the principles of other important UX tools/methods (e.g. Information architecture tests (card-sorts), RITE testing, Competitive Analysis, Thematic coding of qualitative data, etc.). 3. Have a working understanding of the most frequently used UX methods at each point of the development lifecycle, with a specific focus on which methods are best suited to evaluative research. 4. Learn the scientific underpinnings of the various methodologies, including the specific advantages and disadvantages of each. 5. The “real world” application of these skills to industry-paced projects
A practical introduction and guide for using statistics to solve quantitative problems in user research. Many designers and user researchers view usability and user research as qualitative activities, which do not use formulas and numbers. However, usability practitioners and user researchers are increasingly expected to quantify the benefits of their efforts. The impact of good and bad designs can be quantified in terms of user performance, task completion rates and times, perceived user satisfaction. The course will address questions frequently faced by user researchers, such as, how to compare usability of products for A/B testing and competitive analysis, how to measure the interaction behavior and attitudes of users, how to estimate the number of users needed for usability testing. The course will introduce students to a foundation for statistical theories and the best practices needed to apply them. It will cover descriptive statistics, confidence intervals, standardized usability questionnaires, correlation, regression, and analysis of variance. It will also address how to effectively communicate the quantitative results.