Entity-Based Scene Understanding


Recommended citation: C. Cervantes, B. Plummer, S. Lazebnik, & J. Hockenmaier. (2018) Entity-Based Scene Understanding. Master's Thesis. University of Illinois at Urbana-Champaign https://cmcervantes.github.io/files/cervantes_2018_entity.pdf

We define entity-based scene understanding as the task of identifying the entities in a visual scene from multiple descriptions by a) identifying coreference and subset relations between entity mentions, and b) grounding entity mentions to image regions. We apply our models to two datasets (Flickr30K Entities v2 and MSCOCO) and show that grounding can benefit significantly from relation prediction in both cases.