How Machine Learning is Surprisingly Similar to Traditional Vision Software Development

I think that we can safely say that this past year has been the cultural moment that artificial intelligence technology has entered public awareness. In the past year, we have witnessed the rise of ChatGPT and public access to AI image generation. But very few people understand the underlying principles of what the software is doing. Many people don’t understand the difference between deep learning and machine learning. I think that most people would be surprised to learn that the underlying mechanisms on how these technologies work is very similar to the workflow that vision system engineers have been performing for decades.

 To illustrate the relationship and differences between machine learning, deep learning, and machine vision, I will use the example of a candy company that wants a machine that visually identifies all the individual pieces of Halloween candy that are in a bag. In this scenario, an operator will remove one bag from the manufacturing line every hour and will run it through the vision system to confirm that there is an acceptable assortment of candy.

A vision system can automatically identify and count different types of candy

A traditional machine vision engineer would tackle this problem by writing software that would automatically measure specific visual features of each of the pieces of candy in the image. Perhaps, the software would measure the length and width each the piece. The engineer might also use a color histogram to measure the number of pixels in the candy that fall within certain color bands. A piece with a lot of orange pixels is probably a Reese’s Peanut Butter Cup, but a piece with a lot of red pixels is probably a Kit-Kat.

Unfortunately, the hard part is yet to come. The engineer must then determine what parameters are good indicators that a candy would belong to a certain category. Often, this is the most difficult part of the process. How can the developer know where to set the numeric values that divide a Snickers and a Milky Way, which are almost the same size and shape? And, worst of all, the complexity of the problem grows exponentially as the number of candy categories increases. If the vision system needs to work with thousands of different kinds of candy, the engineer might need to spend years fine-tuning the classification algorithms.

Machine learning is the technology that can solve this problem. Just like in the traditional vision system design, the engineer will still needs to write code that extracts some measurements from the images. However, the engineer can use machine learning to classify the different types of candy quickly and accurately. The engineer inputs the data into a neural network, which really is nothing more than a system of math problems that attempts to find out which of the input parameters are the most important for determining the class to which a candy sample belongs. In fact, a neural network not only looks at individual parameters, but which combinations of parameters, or even combinations of combinations are correlated with each class! Each of the parameters or combinations is called a node and they can be depicted in a neural network diagram.

A neural network determines which nodes in the diagram are most important through a trial-and-error approach—similar to the way a human does it but more methodical. Each node is assigned a number called a weight that represents its relative importance to the classification algorithm. The neural network makes an initial, default guess about what the weights should be, then multiplies them by all their input parameters and adds them up. The output is a score for each of the candy categories. Ideally, only one of the categories should have a high score, which indicates that the objects have been identified as that type of candy.

An example of a neural network that might be used to classify three different types of candy. The input data goes into the left. Scores for the three classifications are output to the right. Each of the circles represents a node that has a numeric weight. The network learns the weights by iteratively testing various weights until it converges on a network that accurately classifies the test images.

It runs these calculations for each of the images in the training data set, then compares its results with the correct answers. The neural network then adjusts the weights and repeats the calculations again hundreds of times until it converges onto a final set of nodal weights. At this point, the neural network has “learned” the classification. And, unlike a vision system engineer that can take weeks and months working on it, a neural network can be trained in milliseconds.

Deep learning is like machine learning, except that works at a much larger scale and under more variable conditions. The number of potential categories is so great that it becomes impossible for an engineer to know what visual information to extract from the images. Instead, a complex system of software is developed that will automatically select thousands of visual measurements from the images without human intervention. Then, the deep learning software takes those extracted visual features and inserts them into a neural network with thousands of nodes. Typically, a developer wouldn’t create a deep learning model just to classify candy. The developer would create a model to identify any type of object, then would allow a developer to use their model for the specific application.

If deep learning is more powerful than machine learning, then why would anyone use machine learning? Well, the truth is that, in many ways, machine learning is the more powerful option. If the number of potential classifications is small and if the images are captured in a controlled setting, machine learning will outperform deep learning. In factory automation, this is almost always the case. The problem with deep learning is that it can use a lot of unnecessary parameters that increase the chance that it will accidentally give a node a higher weight than it deserves, which decreases accuracy. Also, deep learning is very computationally expensive, which requires costly hardware and very large training data sets.

For visual inspection in factory automation, machine learning is usually a better approach than deep learning and it will likely be that way for decades until AI robots take over the Earth. In the meantime, machine vision engineers can increase their output and performance by using machine learning techniques in their projects.

Previous
Previous

Everything I Love about Keyence

Next
Next

The Centaur Approach: The Best Use of Artificial Intelligence