Patents and publications can be found below. Note that papers have full text links but, due to the nature of patent applications, only overviews are provided for patents (excepting in cases where there’s an associated paper draft).
Method, Apparatus, and System for Combining Location Data Sources
Filed:
This invention defines a system to combine related sources of location data (e.g. a knowledge graph), both in the context of finding matching entities and to predict new relationships between existing entities. The proposed system is able to automatically merge and enrich disparate data sources through neural classification models.
C. Cervantes & S. Kompella. Method, Apparatus, and System for Combining Location Data Sources. U.S. Patent Application 17/116756, filed December 2020. Patent Pending
Method, Apparatus, and System for Providing Semantic Categorization of an Arbitrarily Granular Location
Filed:
This invention describes a method for combining locations of arbitrary granularity and predicting semantic categories for those groupings. In this context, a location can refer to anything from an individual place of interest to a larger administrative area like a neighborhood or city. Similarly, a semantic category could be equally expressive, encompassing things like “family friendly,” “safe for travelers,” or “trendy”.
C. Cervantes & S. Kompella. Method, Apparatus, and System for Providing Semantic Categorization of an Arbitrarily Granular Location. U.S. Patent Application 17/116743, filed December 2020. Patent Pending
Method, Apparatus, and System for Providing a Location Representation for Machine Learning Tasks
Filed:
This invention describes a broad mechanism to create real-valued vector representations that encode locations’ semantic and spatial properties for use in downstream tasks like search, question answering, relation prediction, and so on. The invention’s methods draw heavily on prior art, but incorporate significant modifications to capture the complex, high-level multi-model information that defines locations.
C. Cervantes & S. Kompella. Method, Apparatus, and System for Providing a Location Representation for Machine Learning Tasks. U.S. Patent Application 17/116727, filed December 2020. Patent Pending
Method, Apparatus, and System for Providing a Context-Aware Location Representation
Filed:
This invention aims to produce dense representations (embeddings) for location entities through the combination of spatial and structured information (e.g. present in a knowledge graph) with unstructured information (e.g. web-scraped text). These representations are constructed using representation learning techniques (e.g. Deepwalk) and traditional natural language processing tools.
S. Kompella & C. Cervantes. Method, Apparatus, and System for Providing a Context-Aware Location Representation. U.S. Patent Application 17/116717, filed December 2020. Patent Pending
Method for Extracting Landmark Graphs from Natural Language Route Instructions
Filed:
Landmarks are central to how people navigate, but most navigation technologies do not incorporate them into their representations. We propose the landmark graph generation task (creating landmark-based spatial representations from natural language) and introduce a fully end-to-end neural approach to generate these graphs. We evaluate our models on the SAIL route instruction dataset, as well as on a small set of real-world delivery instructions that we collected, and we show that our approach yields high quality results on both our task and the related robotic navigation task.
C. Cervantes. Method for Extracting Landmark Graphs from Natural Language Route Instructions. U.S. Patent Application 16/774315, filed January 2020. Patent Pending
Entity-Based Scene Understanding
We define entity-based scene understanding as the task of identifying the entities in a visual scene from multiple descriptions by a) identifying coreference and subset relations between entity mentions, and b) grounding entity mentions to image regions. We apply our models to two datasets (Flickr30K Entities v2 and MSCOCO) and show that grounding can benefit significantly from relation prediction in both cases.
Phrase Localization and Visual Relationship Detection with Comprehensive Image-Language Cues
Published in International Conference on Computer Vision
This paper presents a framework for localization or grounding of phrases in images using a large collection of linguistic and visual cues. We model the appearance, size, and position of entity bounding boxes, adjectives that contain attribute information, and spatial relationships between pairs of entities connected by verbs or prepositions. Special attention is given to relationships between people and clothing or body part mentions, as they are useful for distinguishing individuals.
B. Plummer, A. Mallya, C. Cervantes, J. Hockenmaier, & S. Lazebnik. (2017) Phrase Localization and Visual Relationship Detection with Comprehensive Image-Language Cues. International Conference on Computer Vision (ICCV) https://cmcervantes.github.io/files/plummer_2017_phrase.pdf
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models
Published in International Journal of Computer Vision
This journal version of the 2015 paper of the same name adds experiments, analysis, and examples to the existing work.
B. Plummer, L. Wang, C. Cervantes, J. Caicedo, J. Hockenmaier, & S. Lazebnik. (2017) Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models. International Journal of Computer Vision (IJCV) https://cmcervantes.github.io/files/plummer_2017_flickr30kEntities.pdf
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models
Published in International Conference on Computer Vision
This paper presents Flickr30k Entities, which augments the 158k captions from Flickr30k with 244k coreference chains linking mentions of the same entities in images, as well as 276k manually annotated bounding boxes corresponding to each entity. We present experiments demonstrating the usefulness of our annotations for text-to-image reference resolution, or the task of localizing textual entity mentions in an image, and for bidirectional image-sentence retrieval.
B. Plummer, L. Wang, C. Cervantes, J. Caicedo, J. Hockenmaier, & S. Lazebnik. (2015) Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models. International Conference on Computer Vision (ICCV) https://cmcervantes.github.io/files/plummer_2015_flickr30kEntities.pdf
Narrative Fragment Creation: An Approach for Learning Narrative Knowledge
Published in Advances in Cognitive Systems
We propose the narrative fragment - a sequence of story events - and a method for automatically creating these fragments with narrative generation through partial order planning and analysis through n-gram modeling. The generated plans establish causal and temporal relationships, and by modeling those relationships and creating fragments, our system learns narrative knowledge.