How do robots know what they are looking at?

You may wonder, how do robots know what they are looking at? How do robots tell the difference between dogs & cats? In this post we’ll give an introductory explanation to neural network based classifiers. Recall from how do robots learn, supervised learning is machine learning where the algorithm is trained on labeled data to predict or map input features to corresponding output labels. How can machine learning be used to tell the difference between a dog and a cat?

Supervised Machine Learning: Classification

Cat or dog? Neural Network

Classification is how we describe given data and a set of unique characteristics, we can classify & recognize different objects. Basic classification used by robots to differentiate cats & dogs are done using a neural network.

What is a Neural Network?

  • Teaches computers to process data in a way that is inspired by the human brain’s neurons
  • A type of machine learning, called deep learning (due to the often large number of layers), that use interconnected nodes or “neurons” in a layered structure to transform inputs into desired outputs
Neural Network Visualization

What is a Neural Network “Neuron” / Node?

  • Mathematically, a neural network “neuron” accepts some input, that is fed to an activation function which then maps the inputs into a new form of possibly nonlinear output
  • Weights scale the input
  • Bias offsets the summation
  • Activation functions help encode checks for desired dimensions/traits of inputs
Neural network node

Classifying tree vs balloon example

  • We classify objects by introducing “dimensions” (variables quantifying one specific feature of the object)
Classifying gold balloon vs tree (credit: nang)
  • With enough dimensions, we can accurately distinguish one object from another
  • Each dimension could be checked in the hidden layers, the activation of which helps classify objects

What is a neural network “activation” functions?

  • When a model is trained, the weights & biases are modified to maximize the correctness of the output determined via an objective function
  • The output of all layers encode the “answer” to a question, such as the classification of an object (yes/no for each output node option)
  • Note only outputs in the output layer can be used to answer questions
hidden layers activation functions help look for classification features

Neural network behavior

  • Visual characteristics we use to differentiate cats vs dogs can be translated into classification properties as part of the activation functions of the neurons
    • Size, Shape, Tail
  • Only if features are present will the neurons activate, leading to the desired output (i.e dog)

An image is fed as inputs, while features are extracted via activated neurons, the sum of which classify the dog or cat. Recall, from how robots see, images are just pixel values which are numbers:

  • All colors can be formed from red/green/blue (RGB) values
  • Images can be represented numerically in matrixes
  • Basis of computer vision

Putting it all together: Neural Network Based Dog Classifier:

Here, the hidden layers are explained
  • Images can be represented numerically (pixel values)
  • Numbers can be processed mathematically with functions to identify edges in an image
  • The combination of edges make up features (i.e legs, tail, snout, ears)
  • The combination of features can classify the animal

How do robots read?

The same neural network principle can be used to classify / read numbers & letters! Robots can read by using text-classifiers to identify characters and numbers based on specific properties unique to each.

digit neural network classifier
  • Image of number represented by pixels
  • Pixels fed into different layers which “activate” based on which parts of the image are white
  • The hidden layers are setup so that there is a unique property for each number (i.e, how many straight line segments vs curves does the number have)

How are the hidden layers configured to provide the correct output every time?

Machine learning models can train to find an optimal set of parameters in their hidden neural networks.

Cat or Dog?
  • Generations reflect how often the weights & biases of our neural network had been updated/tuned to lead to more correct output 
  • We can see with more training (generations), we’ve found an ideal set of parameters that lead to walking behavior (and better optimize an objective function such as distance walked)

We hope with this post, you learn how robots are able to know what they are looking at, by classifying images using a neural network classifier.