Microsoft Research Partners with UT Austin, Texas iSchool for Microsoft Ability Initiative

Sandlin, Anu  |  Mar 29, 2019

News Image: 
Image Caption: 
From left to right: Danna Gurari, University of Texas; Ed Cutrell, Microsoft Research; Roy Zimmermann, Microsoft Research; Meredith Ringel Morris, Microsoft Research; Ken Fleischmann, University of Texas; Neel Joshi, Microsoft Research
Microsoft Ability Initiative
Texas iSchool
Microsoft Research
Danna Gurari
Ken Fleischmann
image captioning
visual impairments

Despite significant developments in the world of automated image captioning, current image captioning approaches are not well-aligned with the needs of people with visual impairments. People who are blind or with low vision share a unique and real challenge –their visual impairment exposes them to a time-consuming, and sometimes, impossible task of learning what content is present in an image without visual assistance. As such, these communities often seek a visual assistant to describe photos they take themselves or find online. 

In an ideal world, a fully-automated computer vision (CV) approach would provide such descriptions. However, this artificial intelligence (AI) process is riddled with challenges. Not only is CV work missing images taken by this population, but people who are blind and with low vision are required to passively listen to one-size-fits-all descriptions of images to locate information of interest. In addition, CV algorithms often deliver incomplete or incorrect information. Because of these shortcomings, reliable image captioning systems continue to depend on humans to provide descriptions of photos to people with visual impairments. 

Determined to find a way to improve image captioning for blind and low vision communities, Principal investigator and Texas iSchool Assistant Professor Danna Gurari and Associate Professor Ken Fleischmann believe there is a more efficient and effective solution that reduces human effort and produces accurate results for communities who are blind or with low vision. And they recently embarked on a new project to “design algorithms and systems that close the gap between CV algorithm and human performance for describing pictures taken by both sighted and visually impaired photographers.” 

But the Texas School of Information professors weren’t the only ones thinking about how to improve image captioning for people who are blind or with low vision. A team of researchers at Microsoft Research recently announced a similar vision and goal –to train AI systems to provide more detailed captions that can offer a richer understanding, and more accurate representation of images for the blind or those with low vision. In light of this mission, Microsoft Research developed a new project called the Microsoft Ability Initiative.

According to Microsoft Research Principal Researcher and Research Manager Meredith Ringel Morris, “the companywide initiative aims to create a public dataset that ultimately can be used to advance the state of the art in AI systems for automated image captioning.”

After a competitive process involving a select number of universities, the search for an academic research unit with whom they could partner for the new venture came to an end when Microsoft Research chose The University of Texas at AustinSchool of Information. The proposed work of Gurari and Fleischmann was the only project selected through this competition.

The Texas iSchool research team proposed two main tasks of (1) introducing the first publicly-available image captioning dataset from people with visual impairments paired with a community AI challenge and workshop, and (2) identifying the values and preferences of people with visual impairments –to inform the design of next-generation image captioning systems and datasets. 

“The collaboration builds upon prior Microsoft research that has identified a need for new approaches at the intersection of computer vision and accessibility,” explained Morris.

The companywide initiative aims to create a public dataset that ultimately can be used to advance the state of the art in AI systems for automated image captioning.

The Microsoft Research team which includes Ed Cutrell, Roy Zimmermann, Meredith Ringel Morris, and Neel Joshi, plans to collaborate with UT Austin, School of Information over an 18-month period. Gurari and Fleischmann will lead the UT Austin team, which will also include three PhD students and one postdoctoral fellow.

The Microsoft Ability Initiative builds on the interdisciplinary team’s expertise in computer vision, human-computer interaction, accessibility, ethics, and value-sensitive design. Gurari’s team is experienced in establishing new datasets, designing human-machine partnerships, creating human computer interaction systems, and developing accessible technology. As co-founder of the ECCV VizWiz Grand Challenge in 2018, Gurari is skilled in community-building and has a previous record of success in creating public datasets to advance the state-of-the-art in AI and accessibility.

Fleischmann’s team offers complementary experience in the ethics of AI and understanding users’ values to inform technology design. Given his expertise in the role of human values in the design and use of information technologies, Fleischmann will lead the effort focused on uncovering the needs and values of people with visual impairments –which will ultimately inform the design of future image captioning systems.

The Microsoft researchers involved in this initiative have specialized experience in accessible technologies, human-centric AI systems, and computer vision. “Our efforts are complemented by colleagues in other divisions of the company, including the AI for Accessibility program, which helps fund the initiative, and Microsoft 365 accessibility,” explained Morris.

Dubbed “a collaborative quest to innovate in image captioning for people who are blind or with low vision,” Morris explained that “the Microsoft Ability Initiativeis one of an increasing number of initiatives at Microsoft in which researchers and product developers are coming together in a new, cross-company push to spur innovative and exciting new research and development in the area of accessible technologies.” 

Gurari believes that the initiative “will not only advance the state of the art of vision-to-language technology, but it will also continue the progress Microsoft has made with such tools and resources as the Seeing AI mobile phone application and the Microsoft Common Objects in Context (MS COCO) dataset. It will also serve as a great teaching opportunity for Texas iSchool students.”

The Texas iSchool team will employ a user-centered approach to the problem, including working with communities who are blind or with low vision to improve understanding of their expectations of image captioning tools. The team will also host community challenges and workshops to accelerate progress on algorithm development and facilitate the development of more accessible methods to assist people who are blind or with low vision. 

Gurari and Fleischmann explain that “this work can empower people with visual impairments to more rapidly and accurately learn about the diversity of visual information, while contributing to solving related problems including image search, visual question answering, and robotics.”

The Microsoft Research team launched the new collaboration with the Texas iSchool during a two-day visit to Austin in January. Morris noted that the Microsoft Research team came away from the meeting at The University of Texas at Austin, School of Information, “even more energized about the potential for this initiative to have real impact in the lives of millions of people around the world.” “We couldn’t be more excited,” she said.

The Texas iSchool professors share the Microsoft Research team’s excitement about their upcoming collaboration. “To be selected for this gift is a great honor,” said Gurari and Fleischmann. “We look forward to working with the Microsoft Research team over the months, and are eager to make progress with our shared goal –to better align image captioning systems with the needs of those who are blind or with low vision.” 

Two new faculty members join iSchool

Ferguson, John  |  Aug 29, 2016

News Image: 
Danna Gurari and Amelia Acker
Image Caption: 
New iSchool Assistant Professors Danna Gurari and Amelia Acker
Amelia Acker
Danna Gurari
Andrew Dillon
digital records
image analysis
machine learning

The School of Information has hired two new faculty members whose research is already shaping the interdisciplinary field of information studies.


Assistant Professor Amelia Acker researches the data that people create when they use mobile phones to send text messages, update their Facebook status, or leverage wireless networks in myriad other ways, such as automatically generating GPS coordinates.


“Amelia is emerging as one of the brightest young scholars in digital records and data traces, helping us better understand the transmission of information through time and media,” iSchool Dean and Professor Andrew Dillon said. “She will significantly advance our traditional strengths in archives and records management while enabling new teaching and research opportunities at the intersection of people and technology.”


Assistant Professor Danna Gurari's research interests span computer vision, crowdsourcing, applied machine learning and biomedical image analysis.


“Extracting information from images is an increasingly important challenge in our digital world, and Danna brings a unique mix of computational and crowdsourcing approaches to this problem,” Dillon said. “Her research is already recognized in the biomedical field for its importance, and she will complement our strengths in information discovery and retrieval.”


Gurari’s research has been recognized by the 2015 Researcher Excellence Award from the Boston University computer science department, among other accolades. Prior to joining the iSchool, she was a postdoctoral fellow in UT Austin’s computer science department. Gurari also worked five years in industry, developing software for satellite systems and building custom, high performance, multi-camera image analysis systems for military, industrial and academic applications.


“As an interdisciplinary researcher, I am delighted to join such a richly diverse and intelligent group of professors at the Information School,” she said. “I am excited to join the faculty and work with students on designing systems that accelerate the extraction of information from images and videos.”


Gurari will begin teaching in the Spring 2017 semester.


Acker, who began teaching in Fall 2016, said people’s constant connection to wireless networks is creating vast amounts of data that is transforming culture while raising questions about important issues from government surveillance to the way we read and write on screens. The recipient of a grant from the federal Institute of Museum and Library Services, Acker’s current research also addresses data literacy and digital preservation to support long-term cultural memory, as well as the environmental impact of preserving huge quantities of data.


“There’s a future where all of us will be creating data and metadata, whether we’re intentionally thinking about it or not, just by virtue of carrying a phone,” said Acker, whose award-winning dissertation was a history of the text message as a seminal development in modern, networked culture. “Every time there’s a big jump in technology that allows us to create new information, whether cuneiform tablets or Xerox machines, there’s a huge new change in the ways we remember and understand ourselves as a society. That’s what I’m really interested in right now.”


Acker joins the School of Information from the University of Pittsburgh’s iSchool, where she was lead faculty of the archives program. From 2006-2014 she worked as an archivist, librarian and preservation consultant for libraries and archives in Southern California.


At UT Austin, the iSchool’s commitment to publishing and authorship and its broad curriculum were among factors that drew Acker to Texas, she said, as well as the strength of the school’s archives and conservation programs.


“Historically, the imperative to preserve is something that libraries and museums have been in control of,” Acker said. “As we move toward platforms like Dropbox, Gmail and Instagram, places where we’re constantly creating cultural memory together, how do we think of these new kinds of social media platforms as archives, and how do we make the case or lobby or describe them as such?”


Despite the fact that we are creating more information and more data than ever before, people are also engaging with platforms and products that don’t have long-term storage provisions, Acker said. “There are all sorts of weird things we haven’t really grappled with yet,” she said. “It’s a very exciting time.”

glqxz9283 sfy39587stf02 mnesdcuix8