CVB Polimago vs Deep Learning

Machine learning systems help solve problems autonomously by using analytical/problem-solving intelligence to predict outcomes. But do you know what they need to be able to operate, and which solution would work best for your application? Before we get into what the differences are, let’s cover what deep learning is, why you would want to use it and what you would typically need to get going with it.

Deep learning is a subset of Artificial Intelligence and Machine Learning.


Artificial Intelligence is the analytical/problem-solving intelligence demonstrated by machines. The systems ability to learn from data that is fed to it to predict outcomes (without being explicitly programmed), is Machine Learning. Deep learning further improves on this intelligence by interpreting the data sets and their relationships with each other, using neural networks.

“Deep learning is part of a broader family of machine learning methods based on artificial neural networks. Learning can be supervised, semi-supervised or unsupervised.”

How does deep learning work with machine vision?

To answer that, we need to consider the ways decisions are made by machine vision systems and how they could be improved by introducing automated learning.
If you wanted to train a “typical” vision system to recognise a vehicle type, you might start with something like this when programming the system.
The settings would work well with saloons, and the parameters would need to be adjusted to encompass the other types of cars available.



Once cars were done, you would have to add even more exceptions for all the other types of vehicles.


However, this would only be for the profile view, at this scale. You would need to Include all other possible angles of view/scale that the vehicle could be presented in.

And could you be sure that you had covered every possible variation in the training set? Would it recognise this?

Learning by example

Compare the traditional machine vision training to how a child would learn what a car is.
Every time they were presented with a new instance of a potential “car”, they would compare that with what they already knew about the ones they had already seen before. .
They dont think “it has to have X number of wheels and Y windows for it to be a car”. It could be a silhouette of a car with no wheels and the child would still know it was a car.
500 million years of evolution created this amazing harmony between our eyes and the primary visual cortex in our brain. We don't even notice the complex connections between neurons and what they do to help us remember and recognise the things we see.
Machine learning tries to replicate this, learning by example instead of relying on explicit instructions from an operator. The machine learning algorithms use “training sets” which is simply data gathered from the real-world.
With machine vision, these training sets would involve lots of examples of good and bad images. This data is used create models (real world scenarios) that can help classify data to determine an output.


How would it create these models? Neural networks...

Neural networks


Because there isn’t a linear relationship between the inputs and outputs in these applications, complex decision making processes have to be made in between the inputs and the outputs.
This decision-making is carried out in a neural network, a software/hardware model of the brain, where simple decision-making or logic units are combined in their inputs, outputs and decisions to make a large complex decision-making system.


These logic units are made up of neurons (aka perceptrons) and the whole system works together to form the neural network (brain).
Perceptrons (neurons) were originally developed in the late 1950s by Frank Rosenblatt, an American psychologist recognised for his work in the field of Artificial intelligence.

Frank Rosenblatt, an American psychologist

Neurons (perceptrons) determine a single output from several inputs. Each of these inputs has an associated weight (ie how important it is). Their sum and a threshold value (for the total) is what determines the result.
The interconnectivity of neurons is what makes up a neural network, layers of decision making that increase in complexity as it progresses through the structure to produce an output. Neural networks are typically made up of three types of layers:
Without the intermediate hidden layers, the system can manage only relatively simple (but possibly parallelised) problems. The number hidden layers of neurons is a critical part of the structure, as data can be combined in these to allow complex decision-making.
Although it has the ability to make complex decisions, it doesn't know what it needs to head towards. An untrained neural network ‘knows’ nothing and gives random or chaotic results until the neurons and the network have been ‘trained’ to give the desired output.
This is where a neuron’s ability to change the weighting of the inputs and the threshold level of the output comes into play to give it it’s learning capability. It’s able to “tune” the system to achieve the desired result.
However, to be able to do this, it requires some sort of feedback from the result so that adjustments can be made to fine tune it towards the desired output. This requires data, a lot of it.

Convolutional Neural Networks


A convolutional neural network (CNN) is a sub-class of neural networks, containing at least one layer that is made up of convolutional units that extract data from the image.
The constrained architecture reduces the complexities of the connections of subsequent layers as the the most useful information is captured locally from the previous layer. The weights from this layer are also shared to the subsequent layer, this is what creates the convolution process.
CNNs work well for image recognition as the important information needs to be recognised somewhere in the image. The pattern is passed through and interpreted by the network, regardless of its location in the image.
Revisiting what we touched on in “learning by example”, you need to be able to feed this network data for it to be able to train itself towards the desired outcome - this may include good and bad examples, or simply classification data.

Supervised learning?


Neural networks are not the only learning algorithms, while they are the most popular they make up a tiny subset of the multiple approaches when it comes to machine learning. Supervised learning methods use labelled input images, which the specific algorithm uses to approximate a function that correctly classifies these images. What none of these methods can tell you is the correct inputs or variables to use in designing the algorithm!

"An important feature of a learning machine is that its teacher will often be very largely ignorant of quite what is going on inside, although he may still be able to some extent to predict his pupil's behaviour" (Alan Turing, 1950 in "Computing Machinery and Intelligence". Mind, 59, 433-460)

Data, data, even more data...is it needed?


For a machine learning tool to work, it needs feedback about right and wrong decisions. The labelled training data is all that the machine learning tool knows. This means that the variation in the data informs the tool how a class can look and where the limits lie. Training edge-cases is an important part of the process.
The classifier that is produced is likely to have a complex relationship with the input data (otherwise a classical tool would be fine). The relationship is not only complex, but difficult (or impossible) to probe, making the system a 'black box'.
Because of this, it is critical that the training data is representative of the types of variation that the system will see at runtime. This might mean using a large amount of data so that outliers are not too influential.

CNNs & CVB Polimago: Pros and Cons


The possible applications overlap massively, but some applications are currently only served by CNNs: unsupervised learning (training on unlabelled data) and the segmentation of an image (for example by textural features). By contrast, CVB Polimago has a much smaller requirement for training data, typically 50 images per class instead of 500 to 1000 for CNNs. The smaller training sets make CVB Polimago applicable to many industrial applications, such as variable defect detection, where there simply may not be thousands of images of defects.
The large training sets of CNNs require considerable processing. The highly-parallel nature of GPUs helps to accelerate CNN training (by a factor of 10x to 100x). CVB Polimago sees no advantage in GPU processing, but the smaller training sets mean that training is still quicker.
The assumptions about data in CVB Polimago have a further advantage, far fewer training parameters compared to CNNs, meaning that new users may find CVB Polimago easier than a CNN.

Head over to our Common Vision Blox Website to find out more about CVB Polimago and how it works

CVB Polimago Application Examples

Chicken Cut Identification

Difficult OCR

Selective Harvesting

Vehicle type classification

Sandwich Identification

Fish Identification