How do robots know what they are looking at?

You may wonder, how do robots know what they are looking at? How do robots tell the difference between dogs & cats? In this post we’ll give an introductory explanation to neural network based classifiers. Recall from how do robots learn, supervised learning is machine learning where the algorithm is trained on labeled data to predict or map input features to corresponding output labels. How can machine learning be used to tell the difference between a dog and a cat?

Supervised Machine Learning: Classification

Classification is how we describe given data and a set of unique characteristics, we can classify & recognize different objects. Basic classification used by robots to differentiate cats & dogs are done using a neural network.

What is a Neural Network?

Teaches computers to process data in a way that is inspired by the human brain’s neurons
A type of machine learning, called deep learning (due to the often large number of layers), that use interconnected nodes or “neurons” in a layered structure to transform inputs into desired outputs

What is a Neural Network “Neuron” / Node?

Mathematically, a neural network “neuron” accepts some input, that is fed to an activation function which then maps the inputs into a new form of possibly nonlinear output
Weights scale the input
Bias offsets the summation
Activation functions help encode checks for desired dimensions/traits of inputs

Classifying tree vs balloon example

We classify objects by introducing “dimensions” (variables quantifying one specific feature of the object)

Classifying gold balloon vs tree (credit: nang)

With enough dimensions, we can accurately distinguish one object from another

Each dimension could be checked in the hidden layers, the activation of which helps classify objects

What is a neural network “activation” functions?

When a model is trained, the weights & biases are modified to maximize the correctness of the output determined via an objective function
The output of all layers encode the “answer” to a question, such as the classification of an object (yes/no for each output node option)
Note only outputs in the output layer can be used to answer questions

hidden layers activation functions help look for classification features

Neural network behavior

Visual characteristics we use to differentiate cats vs dogs can be translated into classification properties as part of the activation functions of the neurons
- Size, Shape, Tail
Only if features are present will the neurons activate, leading to the desired output (i.e dog)

An image is fed as inputs, while features are extracted via activated neurons, the sum of which classify the dog or cat. Recall, from how robots see, images are just pixel values which are numbers:

All colors can be formed from red/green/blue (RGB) values
Images can be represented numerically in matrixes
Basis of computer vision

Putting it all together: Neural Network Based Dog Classifier:

Images can be represented numerically (pixel values)
Numbers can be processed mathematically with functions to identify edges in an image
The combination of edges make up features (i.e legs, tail, snout, ears)
The combination of features can classify the animal

How do robots read?

The same neural network principle can be used to classify / read numbers & letters! Robots can read by using text-classifiers to identify characters and numbers based on specific properties unique to each.

Image of number represented by pixels
Pixels fed into different layers which “activate” based on which parts of the image are white
The hidden layers are setup so that there is a unique property for each number (i.e, how many straight line segments vs curves does the number have)

How are the hidden layers configured to provide the correct output every time?

Machine learning models can train to find an optimal set of parameters in their hidden neural networks.

Generations reflect how often the weights & biases of our neural network had been updated/tuned to lead to more correct output
We can see with more training (generations), we’ve found an ideal set of parameters that lead to walking behavior (and better optimize an objective function such as distance walked)

We hope with this post, you learn how robots are able to know what they are looking at, by classifying images using a neural network classifier.