Research Awards/Grants (Current)

Ahmer Arif

National Science Foundation (NSF)

10/01/2022 to 09/30/2024

The collaborative award is $5,000,000 over the project period. The School of Information portion of the award is $1,368,142

NSF Convergence Accelerator Track F: Co-designing for Trust: Reimagining Online Information Literacies with Underserved Communities

In 2011, the National Science Foundation began requiring that all funded projects provide data management
plans (DMPs) to ensure that project data, computer codes, and methodological procedures were available to other
scientists for future use. However, the extent to which these data management requirements have resulted in more and
better use of project data remains an open question. This project thus investigates the National Science Foundation's
DMP mandate as a national science policy and examines the broad impacts of this policy across a strategic sample of five
disciplines funded by the National Science Foundation. It considers the organization and structure of DMPs across fields,
the institutions involved in data sharing, data preservation practices, the extent to which DMPs enable others to use
secondary project data, and the kinds of data governance and preservation practices that ensure that data are sustained
and accessible. Systematic investigation of the impact of DMPs and data sharing cultures across fields will assist funding
agencies and research scientists working to produce reproducible and open science by identifying barriers to data
archiving, sharing, and access. The principal investigators will use project findings to develop data governance guidelines
for information professionals working with scientific data and to articulate best practices for scientific communities
using DMPs for data management.

This project aims to enhance understanding of the role data management plans (DMPs) play in shaping data life-cycles.
It does so by examining DMPs across five fields funded by the National Science Foundation to understand data practices,
archiving and access issues, the infrastructures that support data sharing and reuse, and the extent to which project
data are later used by other researchers. In phase I, the investigators will gather a strategic sample of DMPs
representing a wide range of data types and data retention practices from different scientific fields. Phase II consists of
forensic data analysis of a subset of DMPs to discover what has become of project data. Phase III develops detailed case
studies of research project data life-cycles and data afterlives with qualitative interviews and archival documentary
analysis to help develop best practices for sustainable data preservation, access, and sharing. Phase IV will translate
findings into data governance recommendations for stakeholders. The project thus contributes to research about
contemporary studies of scientific data production and circulation while assessing the effect of DMPs as a national
science policy initiative affecting data management practices in different scientific communities. The comparative
research design and mixed methods enables theory building about cross-disciplinary data practices and data cultures
across fields and advances knowledge within data studies, information management studies, and science and
technology studies.

Min Kyung Lee

National Science Foundation (NSF)

09/01/2022 to 08/31/2025

The award is $249,999 over the project period.

Collaborative Research: DASS: Designing accountable software systems for worker-centered algorithmic management

Software systems have become an integral part of public and private sector management, assisting and automating critical human decisions such as selecting people and allocating resources. Emerging evidence suggests that software systems for algorithmic management can significantly undermine workforce well-being and may be poorly suited to fostering accountability to existing labor law. For example, warehouse workers are under serious physical and psychological stress due to task assignment and tracking without appropriate break times. On-demand ride drivers feel that automated evaluation is unfair and distrust the system?s opaque payment calculations which has led to worker lawsuits for wage underpayment. Shift workers suffer from unpredictable schedules that destabilize work-life balance and disrupt their ability to plan ahead. Meanwhile, there is not yet an established mechanism to regulate such software systems. For example, there is no expert consensus on how to apply concepts of fairness in software systems. Existing work laws have not kept pace with emerging forms of work, such as algorithmic management and digital labor platforms that introduce new risk to workers, including work-schedule volatility and employer surveillance of workers both on and off the job. To tackle these challenges, we aim to develop technical approaches that can (1) make software accountable to existing law, and (2) address the gaps in existing law by measuring the negative impacts of certain software use and behavior, so as to help stakeholders better mitigate those effects. In other words, we aim to make software accountable to law and policy, and leverage it to make software users (individuals and firms) accountable to the affected population and the public.

This project is developing novel methods to enable standards and disclosure-based regulation in and through software systems drawing from formal methods, human-computer interaction, sociology, public policy, and law throughout the software development cycle. The work will focus on algorithmic work scheduling, which impacts shift workers who make up 25% of workers in the United States. It will take a participatory approach involving stakeholders, public policy and legal experts, governments, commercial software companies, as well as software users in firms and those affected by the software?s use, in the software design and evaluation. The research will take place in three thrusts in the context of algorithmic scheduling: (1) participatory formalization of regulatory software requirements, (2) scalable and interactive formal methods and automated reasoning for software guarantees and decision support, and (3) regulatory outcome evaluation and monitoring. By developing accountable scheduling software, the project has the potential for significant broader impacts by giving businesses the tools they need for compliance with and accountability to existing work scheduling regulations, as well as the capacity to provide more schedule stability and predictability in their business operations.

Matthew Lease

Jessy Li

Cisco Systems Inc.

06/01/2022 to 08/31/2025

The award is $199,458 over the project period. 

Classifying Text with Intuitive and Faithful Model Explanations

The objective of this Research Project is to develop an advanced neural NLP modeling framework for interpretable and accurate text classification. Intuitively, when human users better understand model predictions (via model interpretability), the users can better use model predictions to augment their own human reasoning and decision-making. More generally, effective model explanations offer a variety of other potential benefits, such as promoting trust, adoption, auditing, and documentation of model decisions. Our modeling framework, ProtoType-based Explanations for Natural Language (ProtoTexNL), seeks to provide faithful explanations for model predictions in relation to training examples and features of the input text. 

Ying Ding

led by Yifan Peng Weill Cornell Medicine

National Institutes of Health (NIH)

08/01/2023 to 04/30/2028

The collaborative award is $712,024 over the project period. The School of Information portion of the award is $333,944.

Closing the loop with an automatic referral population and summarization system

In the United States, more than a third of patients are referred to a specialist each year, and specialist visits constitute more than half of outpatient visits. Even though all physicians highly value communication between primary care providers (PCPs) and specialists, both PCPs and specialists cite the lack of effective information transfer as one of the most significant problems in the referral process. Therefore, it is critical to investigate a new method to improve communication during care transitions. With their ubiquitous use, it is recognized that electronic health records (EHRs) should ensure a seamless flow of information across healthcare systems to improve the referral process. But, a lack of accessible and relevant information in the referral process remains a pressing problem. Recently, emerging deep learning (DL) and natural language processing (NLP) methods have been successfully applied in extracting pertinent information from EHRs and generating text summarization to improve care quality and patient outcomes. However, existing technologies cannot be applied to process heterogeneous data from EHRs and create high-quality clinical summaries for communicating a reason for referral. Responding to PA-20-185, this project will develop and validate a novel informatics framework to collect and synthesize longitudinal, multimodal EHR data for automatic referral form generation and summarization. While the referring provider and specialist can be any type of provider for any condition, the focus in this application has been on headache for primary care, because it is an extremely common symptom and affects people of all ages, races, and socioeconomic statuses. More importantly, relevant information needed for headache referrals has been defined in local and national evidence-based practice guidelines. Therefore, a health information technology solution to make these data accessible will empower communication between PCPs and specialists, which can improve the care of millions of patients suffering from disabling headache disorders. Based on our preliminary data and our experience with an interdisciplinary team of data scientists and physicians, we plan to execute specific aims: 1) Convert text-based guidelines into a standards-based algorithm for electronic implementation; 2) develop models to automatically populate data from EHR and clinical notes to fill the referral form; 3) create a framework to summarize the longitudinal clinical notes to fill out the referral form; and 4) develop and validate the headache referral system with a user-centered design approach. The research proposed in this project is novel and innovative because it will produce and rigorously test new solutions to improve the communication between health professionals to ensure that safe, high-quality care is provided and care continuity is maintained. The success of this project will (1) fill important gaps in our knowledge of understanding the types of information exchange that will optimize patient care during transitions and (2) provide evidence-based solutions to enable the exchange.

Ying Ding

led by Trey Ideker, University of California, San Diego

National Institutes of Health (NIH)

09/01/2022 to 08/31/2026

The collaborative award is $4,894,457 over the project period. The School of Information portion of the award is $333,944.

Bridge2AI: Cell Maps for AI (CM4AI) Data Generation Project

As part of the NIH Common Fund’s Bridge2AI program, the CM4AI data generation project seeks to map the spatiotemporal architecture of human cells and use these maps toward the grand challenge of interpretable genotype-phenotype learning. In genomics and precision medicine, machine learning models are often "black boxes," predicting phenotypes from genotypes without understanding the mechanisms by which such translation occurs. To address this deficiency, project will launch a coordinated effort involving three complementary mapping approaches – proteomic mass spectrometry, cellular imaging, and genetic perturbation via CRISPR/Cas9 – creating a library of large-scale maps of cellular structure/function across demographic and disease contexts.

These data will broadly stimulate research and development in "visible" machine learning systems informed by multi-scale cell and tissue architecture. In addition to data and tools, this project will implement a standards data management approach based on FAIR access and software principles, with deep provenance and replication packages for representation of cell maps and their underlying datasets; initiate a research program in ethical AI, especially as it relates to how maps will be used in genomic medicine and model interpretation; and stimulate a diverse portfolio of training opportunities in the emerging field of biomachine learning.