# How do robots know what they are looking at?

You may wonder, how do robots know what they are looking at? How do robots tell the difference between dogs & cats? In this post we’ll give an introductory explanation to neural network based classifiers. Recall from how do robots learn, supervised learning is machine learning where the algorithm is trained on labeled data to predict or map input features to corresponding output labels. How can machine learning be used to tell the difference between a dog and a cat?

## Supervised Machine Learning: Classification

Classification is how we describe given data and a set of unique characteristics, we can classify & recognize different objects. Basic classification used by robots to differentiate cats & dogs are done using a neural network.

What is a Neural Network?

• Teaches computers to process data in a way that is inspired by the human brain’s neurons
• A type of machine learning, called deep learning (due to the often large number of layers), that use interconnected nodes or “neurons” in a layered structure to transform inputs into desired outputs

What is a Neural Network “Neuron” / Node?

• Mathematically, a neural network “neuron” accepts some input, that is fed to an activation function which then maps the inputs into a new form of possibly nonlinear output
• Weights scale the input
• Bias offsets the summation
• Activation functions help encode checks for desired dimensions/traits of inputs

Classifying tree vs balloon example

• We classify objects by introducing “dimensions” (variables quantifying one specific feature of the object)
• With enough dimensions, we can accurately distinguish one object from another
• Each dimension could be checked in the hidden layers, the activation of which helps classify objects

What is a neural network “activation” functions?

• When a model is trained, the weights & biases are modified to maximize the correctness of the output determined via an objective function
• The output of all layers encode the “answer” to a question, such as the classification of an object (yes/no for each output node option)
• Note only outputs in the output layer can be used to answer questions

Neural network behavior

• Visual characteristics we use to differentiate cats vs dogs can be translated into classification properties as part of the activation functions of the neurons
• Size, Shape, Tail
• Only if features are present will the neurons activate, leading to the desired output (i.e dog)

An image is fed as inputs, while features are extracted via activated neurons, the sum of which classify the dog or cat. Recall, from how robots see, images are just pixel values which are numbers:

• All colors can be formed from red/green/blue (RGB) values
• Images can be represented numerically in matrixes
• Basis of computer vision

Putting it all together: Neural Network Based Dog Classifier:

• Images can be represented numerically (pixel values)
• Numbers can be processed mathematically with functions to identify edges in an image
• The combination of edges make up features (i.e legs, tail, snout, ears)
• The combination of features can classify the animal

The same neural network principle can be used to classify / read numbers & letters! Robots can read by using text-classifiers to identify characters and numbers based on specific properties unique to each.