Dissertation Defense: Hyun Joon Jung


Title: Temporal Modeling Crowd Work Quality for Quality Assurance in Crowdsourcing

Speaker: Hyun Joon Jung (Ph.D. Candidate, UT Austin iSchool)


While crowdsourcing offers potential traction on data collection at scale, it also poses new and significant quality concerns. Beyond the obvious issue of any new methodology being untested and often suffering initial growing pains, crowdsourcing has faced a very particular criticism since its inception: given anonymity of crowd workers, it is questionable whether we can trust their contributions as much as work completed by trusted workers. To relieve this concern, recent studies have proposed a variety of methods. However, most of them still rely on the measurement of crowd workers' quality with a simple metric such as accuracy.

This dissertation focuses on the measurement and prediction of crowd work quality by considering temporal property. While temporal behavioral patterns can be discerned to underlie real crowd work, prior studies have typically modeled worker performance under an assumption that a sequence of model variables is independent and identically distributed (i.i.d). To better model such temporal worker behavior, we present a time-series prediction model for crowd work quality. This model captures and summarizes past worker label quality, enabling us to better predict the quality of each worker's next label. Furthermore, we propose a crowd assessor model for predicting crowd work quality more accurately. By taking account of multi-dimensional features of a crowd assessor, we aim to build a better quality prediction model of crowd work. Finally, this dissertation explores how the proposed prediction models work under realistic scenarios. In particular, we consider a realistic use case that limited gold labels are provided for learning our proposed model. For this problem, we leverage instance weighting with soft labels which takes account of uncertainty of each training instances. Our empirical evaluation with synthetic datasets and a public crowdsourcing dataset has shown that our proposed models significantly improve prediction quality of crowd work as well as lead to an acquisition of better quality labels in crowdsourcing.

Homepage: <a href="https://www.linkedin.com/in/hyunvincero">https://www.linkedin.com/in/hyu...


4:00am to 6:00am


glqxz9283 sfy39587stf02 mnesdcuix8