I 320D Topics in Human-Centered Data Science : Data Engineering
Principles and practices in data engineering. Emphasis on the data engineering lifecycle and how to build data pipelines to collect, transform, analyze, and visualize data from multiple source systems. We will discuss data modeling techniques for organizing and managing data. We will look at data as an organizational asset and as a product. We will examine the various roles data engineers can have in an organization and career paths for data professionals.
The class will balance general principles with hands-on experience with some of the tools, languages, and techniques of the modern data stack. Emphasis will be placed on SQL as the primary language of data engineering along with low- or no-code tools that leverage SQL, plus a little python. We’ll walk through building data pipelines end-to-end, from ingesting source data to creating analytical data products that deliver value to organizations. We’ll use business intelligence tools to build visualizations using those data products. We will look at both batch processing and streaming systems to understand their pros and cons. We’ll talk about data lakes, data warehouses, ETL/ELT, and batch and streaming systems to understand the pros and cons of each. We will look at issues around data quality, understand the uses of data catalogs, examine data lineage and data profiling tools, and discuss data governance in organizations. Time permitting, we’ll also discuss trends and future directions in data engineering.
Informatics 304 and 310D.
Informatics majors will have top registration priority through the early periods of registration. Informatics minors are encouraged to join the waitlist, which will begin promoting students on July 18 if seats remain available.
All other students will need to complete this Registration Support Questionnaire in order to request a seat in any of our classes.