People with visual impairments regularly encounter the challenge that their visual impairments expose them to a time-consuming, or even impossible, task of learning what content is presented in an image without assistance. One method to address this problem is image captioning with machine learning. With the help of image captioning algorithms together with artificial intelligence speech system, people who are blind can instantly learn what is in an image, since such systems can automatically generate text captions. In this work, we analyze the new VizWiz dataset and compare it to the MSCOCO dataset, which is widely used for evaluating the performance of image captioning algorithms. We also implement and evaluate two state-of-the-art image caption models with accuracy, runtime and resource analysis. Hopefully, our research will help the improvement of image captioning algorithms which focus on fulfilling the everyday need for people with visual impairments.
Image Captioning Algorithms for Images Taken by People with Visual Impairments
Abstract
First Name
Meng
Last Name
Zhang
Industry
Organization
Supervisor
Capstone Type
Date
Spring 2019