Entity Based Visual Scene Understanding

Date:

Presented work for coreference resolution for parallel image captions, bridging anaphora, and grounding using bidirectional LSTMs, simple feed-forward networks, and a multi-task learning scheme for interrelated vision and language tasks.