University of Texas iSchool logo with crest on an orange background
Wednesday Feb. 17, 2021
Md Mustafizur Rahman’s Dissertation Proposal Defense
Noon to 2 p.m.
Zoom link to be provided via email (iSchool listserv)

Presenter: Md. Mustafizur (Mustaf) Rahman

Title: Reliable and Low-cost Test Collections Construction using Machine Learning

Abstract: The development of new search algorithms requires an evaluation framework in which A/B testing of new vs. existing algorithms can be reliably performed. While today's search evaluation methodology is reliable, it relies heavily upon people manually annotating the relevance of many search results, which is slow and expensive. Moreover, this practice has become increasingly infeasible as digital collections have grown ever-larger. Consequently, there is an urgent need today for better IR evaluation methods that are both cost-effective and reliable. My doctoral research focuses on developing low-cost yet reliable IR evaluation methods by integrating state-of-the-art machine learning (ML) techniques with traditional human annotation.

More specifically, in this dissertation proposal, I focus on improving system-based IR evaluation methods that rely on constructing test collections. I present my work in four directions: i) understanding the effects of the participating systems on the qualities of a test collection, ii) modeling a machine learning system to reduce the human annotation efforts for a given search topic, iii) allocating annotation budget across search topics via a dynamic feedback loop between a reinforcement learning method and an active learning algorithm, and iv) developing a dataset for hate speech by adapting methods for constructing test collections  in IR.  

In the first direction, I investigate how the qualities of a test collection are impacted by the number of participating systems. Then I propose a robust prediction model that can be utilized to predict the qualities of test collection even before collecting relevance judgments. As for the second direction, I seek to reduce the human annotation effort needed to evaluate IR systems by using active learning.  Specifically, rather than relying entirely on human annotators to judge search results, I propose an amalgam of human annotation and machine intelligence.  In the third direction, I aim at predicting how human judging effort can be intelligently allocated across different search topics. Whereas traditional approaches allocate the same human judging effort across different search topics, I utilize reinforcement learning which in combination with the active learning algorithm enables us to allocate budget dynamically for each search topic. Finally, to complete this dissertation work, I propose to develop a dataset for hate speech by exploring the ideas of developing test collections in IR. The two major characteristics of this hate speech dataset will be: i) covering various aspects of hate speech (e.g., anti-Muslim and ethnic minority are two aspects of racist hate speech), and ii) obtaining a higher ratio of hate speech to normal speech than found in existing hate speech datasets.  

Committee: Matt Lease (Chair), Ying Ding, James Howison and Mucahid Kutlu (TOBB University of Economics and Technology, Turkey)

**Zoom link will be provided later via email