Projects

A selection of my recent projects can be found below. Note that both the summaries and the listed technologies are inexhaustive, and are mainly included to provide general information on the projects.

Natural Guidance Features

Present

Executive Summary

Our goal in this project is to automatically identify and localize storefronts from street-level imagery, given a reference image. That is, if we have a good reference image for a storefront, we aim to construct computer vision techniques for automatically finding that storefront in other street-level images (e.g. dash-cam images). Once created, these methods will then be used in a product which determines whether street-level landmarks are salient enough to be used as navigation cues (e.g. ‘turn left at the post office’).

Academic Summary

Work on this project is ongoing, but our current methods include U-Net and ResNet methods for semantic segmentation (i.e. masks for the reference storefront in query images). It is our expectation that new methods will be developed to more robustly handle the unique computer vision challenges present in street-level imagery.

Technology Used

Python | Tensorflow | Numpy | Pillow | Bash |

AI/ML Task Force: Object Detection in 3D Point Clouds

Executive Summary

This task force sought to identify and address customer issues with a product. I was specifically tasked with identifying issues in the machine learning mechanism used to identify poles and barriers in 3D point LiDAR point clouds. During this project I worked closely with the production team to determine whether improvements could and should be made, given requirements.

Technology Used

Python | Numpy | Matplotlib | Bash |

Location Ontology

Executive Summary

While there were many resources at HERE for describing where an entity is, there were comparatively few to understand what an entity is and how it related to other entities. We created an ontology for locations (e.g. businesses, parks) leveraging internal data sources and external, publicly available data (e.g. Wikidata, Schema.org). We then developed machine learning methods that could automatically relate new data to our existing dataset and find previously unknown relationships between existing entities.

Academic Summary

As a first step in the development of an ontology for locations, we combined proprietary data with publicly available knowledge bases using heuristic and learned methods. The resulting ontology’s nodes represented physical or conceptual entities (e.g. ‘Starbucks’, ‘coffee shop’, ‘eating establishment’) and edges represented hierarchical membership relations (e.g. INSTANCE_OF, SUBCLASS_OF). In this way, graph proximity indicated conceptual proximity (e.g. a coffee shop is closer to a bakery than a hospital). We then developed methods for predicting relations between entities in this graph, and showed that training such a model on the ontology and incorporating external text sources allowed us to predict previously unknown relations from known entities.

Technology Used

Python | Tensorflow | Scikit-Learn | Numpy | NLTK | Neo4j |

See also: 2020-12-09_patent-17-116717 | 2020-12-09_patent-17-116727 | 2020-12-09_patent-17-116743 | 2020-12-09_patent-17-116756 |

Last Meter Delivery

Executive Summary

The last mile delivery problem is the challenge of delivering goods from a central location to their final destination. Last meter delivery refers to the subproblem of finding a non-obvious micro-location (e.g. a side entrance) when a street address is already known. In this project, I developed a method for automatically extracting relational landmark-based representations of space given route instructions written in natural language, with the end goal of developing a mobile app for couriers that automatically produced schematic maps.

 

Academic Summary

In this project, we extract relational landmark-based spatial representations from natural language route instructions by leveraging prior work with robotic navigation. Specifically, we expect route instructions (one or more sentences describing a path through space) as input, and extract both landmarks (noun phrases relating to physical entities in space) and the relations between them that define the path described in the route. We do this by decomposing the task into predicting states (landmarks) and actions (movements), as is common in the robotic navigation literature. To do so, we leverage a bidirectional LSTM to encode sentences and then apply an attention mechanism in conjunction with an RNN decoder to predict both landmark phrases (i.e. phrases that define decision points) and one of a set of actions (e.g. ‘turn’, ‘forward’) until a goal is reached. In this way, the model can be said to learn to attend to different parts of the instruction as it progresses through the described path.

Technology Used

Python | NLTK | Tensorflow | Numpy | Matplotlib | Java for Android |

See also: 2020-01-24_patent-16-774315 | 2019-06-25_talk_last-meter-delivery | cervantes_2019_route.pdf |

Entity-Based Scene Understanding

Executive Summary

This Master’s thesis defines a machine learning method for identifying entities in a visual scene using multiple text descriptions. Specifically, we use a neural approach to determine which phrases from parallel captions corefer (e.g. ‘A man with his dog’ and ‘A person on their pet’ yields {‘A man’, ‘A person’} and {‘his dog’, ‘their pet’}) and to which image regions those entities correspond. In addition, we created and introduced a new annotated image caption dataset: Flickr30kEntities v2

 

Academic Summary

Unifying multiple descriptions to determine the details of an everyday event can be a challenging task for humans. Though incorporating other modalities like images or videos can help humans unify such descriptions, this remains a challenging task for computational systems. We define entity-based scene understanding as the task of identifying the entities in a visual scene from multiple descriptions. This task subsumes coreference resolution, bridging resolution, and grounding to produce mutually consistent relations between entity mentions and groundings between mentions and image regions. Using neural classifiers (including a bidirectional LSTM and feed-forward layers) and integer linear program inference (to ensure phrase-phrase and phrase-region relation consistency), we show that grounding is improved when forced to conform to relation predictions. We introduce the Flickr30k Entities v2 dataset, and show how our methods can be used to automatically generate similarly rich annotations for the MSCOCO dataset.

Technology Used

Java | Python | NLTK | Tensorflow | Scikit-Learn | Numpy | LBJava | Stanford Core NLP | Gurobi ILP Solver | PHP | Bash |

See also: 2018-02-01_thesis_cervantes-entity |

Repository