ImageCaptionLearn is the main Java application for handling image caption data; this project handles various preprocessing tasks, feature extraction, and inference (ILP problems are solved with Gurobi).
The github repositories below were written for my thesis work: Entity-Based Scene Understanding for Flickr30k and MSCOCO. This project jointly peformed coreference resolution, bridging anaphora resolution, and grounding (or reference resolution) for image caption datasets ( Flickr30k Entities and MSCOCO). Specifically, my projects include components for
- Feature extraction: for all composite tasks, binary, real-valued, and one-hot features were extracted from text
- Model training and evaluation: all composite tasks required training and evaluation of linear models (particularly logistic regression) and neural models (of which the central component were bidirectional LSTMs)
- Inference: coreference and bridging operate over the space of entity mentions (noun phrases), and grounding operates over the space of mentions and image regions; both require inference to produce consistent graphs, which we framd as an integer linear program
ImageCaptionTools is an integral component of the overall ImageCaptionLearn project, contianing all of the various utilities and structures needed to manipulate image caption data. The Tools were put into their own project, rather than being included in Learn, to allow other lab members to more easily incorporate the image caption structures without needing to include all the preprocessing and inference code present in Learn.
ImageCaptionLearn_py contains all the code needed to train models in Python, primarily using Scikit-Learn and Tensorflow.