Visual Place Recognition (A Survey)

# What is Visual Place Recognition ?

Given an image of a place, the process of determining if it's already seen or not is known as Visual Place Recognition. A human or animal can decide whether or not image is of a place it has already seen , this acts as the fundamental guidance in navigation.

There are some fundamental things a place recognition system must have and must do.

A place recognition system must have an internal representation of the environment - (map) to com-pare to the incoming visual data.
A place recognition system must report a belief about whether or not the current visual information is from a place already included in the map .

# Challenges

The various challenges can be encountered for Visual Place Recognition are :

The appearance of a place can change drastically .
Perceptual aliasing : Multiple places in an environment may look very similar.
Places may not always be revisited from the same viewpoint and position as before.

# Components

Visual place recognition systems contain three key components :

Image processing module : This module interpret the incoming visual data.
Map : Maintains a representation of the robot’s knowledge of the world .
Belief generation module : This uses the incoming sensor data in combination with the map to make a decision about whether the robot is in a familiar or novel place .

# What is Place ?

The definition of a place depends on the navigation context,and may either be considered as a precise position—“a place describes part of the environment as a zero-dimensional point”, or as a larger area—“a place may also be defined as the abstraction of a region” where a region “represents at two-dimensional subset of the environment” .

For ex-ample, a room in a building might in some cases qualify as a single place, while in other cases, it might contain many different places. A region could also be defined as a 3D area, depending on the requirements of the environment or robot.

# How new place is determined ?

A new place is added according to a particular time step, or when the robot has traveled a certain distance.

Alternatively, a place can be defined in terms of its appearance.New place can be defined a place as somewhere distinctive relative to other nearby locations, according to some associated sensory information known as a place signature or place description.

# Describing Places : The Image Processing Module

Visual place description techniques fall into two broad categories:

Local Feature Descriptors :

This selectively extract parts of the image that are in some way interesting or notable .

Local feature descriptors first require a detection phase which determines the parts of the image to retain as local features.

# Bag of Words model :

Each image may contain hundreds of local features, and directly matching image features can be inefficient . The Bag of Words model increases efficiency by quantizing local features into a vocabulary that can be compared using text retrieval techniques .

Images described using the bag-of-words model can be efficiently compared using binary string comparison such as a Hamming distance or histogram comparison techniques.

Vocabulary trees can make the process for large-scale place recognition even more efficient. Originally proposed for object recognition, vocabulary trees use a hierarchical model to define words, an approach that enables faster lookup of visual words and the use of a larger and thus more discriminating vocabulary.

Because the bag-of-words model ignores the geometric structure of the place it is describing, the resulting place description is pose invariant; that is, the place can be recognized regardless of the position of the robot within the place however it lack in condition invariance.

The trade off between pose invariance—recognizing places regardless of the robot orientation and condition invariance—recognizing places when the visual appearance changes—has not yet been resolved and is a current challenge in place recognition research.

The bag-of-words model is typically predefined based on features extracted from a training image sequence. This approach can be limiting as the resulting model is environment dependent and needs to be retrained if a robot is moved into a new area.

Global Descriptors : This describe the whole scene, without a selection phase .

Global place descriptors used in early localization systems included color histograms and descriptors based on principal component analysis

# Remembering Places : The Mapping Module

Pure Image Retrieval : Pure image retrieval assumes that matching is based solely on appearance similarity and applies image retrieval techniques from computer vision that are not specific to place-based information .

Place recognition can also be made more efficient by using hierarchical searching at the place level as well as at the vocabulary level.

Topological Maps : Pure topological maps contain information about relative positions of places but do not store metric information regarding how these places are relate .

While image retrieval techniques can use an inverted index to improve efficiency, topological maps can use a location prior to speed up matching, that is, the place recognition system only has to search places known to be close to the robot’s current position.

Topological-Metric maps : As image retrieval can be enhanced by adding topological information, topological maps can be enhanced by including metric information—distance, direction, or both—on the map edges .

These topological-metric maps can be appearance-based, in which case metric information is only included as relative poses between each place node

# RECOGNIZING PLACES:THE BELIEF GENERATION MODULE :

The central goal of any place recognition system is reconciling visual input with the stored map data to generate a belief distribution.This distribution provides a measure of likelihood or confidence that the current visual input matches a particular location in the robot’s map representation of the world. There is a general understanding that if two place descriptions appear similar there is a greater likelihood of them being captured at the same physical location, but the degree to which this is true depends on the particular environment. For example, repetitive environments may exhibit perceptual aliasing where different places are indistinguishable. Conversely, changing conditions may cause the same place to appear drastically different at different times.

Loop closure is vital for consistent mapping as it allows the system to correct drift in local odometry measurements .If the place descriptions are appearance based and do not contain any metric information, but the map contains metric distances between places, the system can still use the loop closures to perform metric correction at the place level.

localization at a topological level occurs that is, the system simply identifies the most likely location.

If a system uses the bag-of-words model, inspired as it is by text-based document analysis, it may use the related Term frequency-inverse document frequency (TF-IDF) score . Each visual word in an image has a TF-IDF score,which is made up of two parts: the term frequency, which measures how often the word appears in the image, and the inverse document frequency, which measures whether the word is common across all images. The TF-IDF score is then the product of these two values.

# Reference

LOWRYet al.: VISUAL PLACE RECOGNITION: A SURVE

← Paper Reading