HBO’s Silicon Valley chronicles the (often futile) exploits a group of startup founders undertake as they try to adequately address the above issue. A popular example of such a project is Not Hotdog — an app that determines whether objects are hotdogs or not. Since its creation, the app affected a profound change in perspectives regarding food, creating a rigid dichotomy between foods that are hotdogs and foods that aren’t.
You might wonder — what’s the driving force behind Not Hotdog’s success? Just as cannonballs must have physical weight to be effective, applications must have technological heft to be impactful. Not Hotdog’s heftiness lies in its use of machine learning, a mysterious subfield of computer science and statistics that has matured rapidly in recent years. Despite the recent abundance in machine learning applications, solutions that incorporate neural networks are often heavy-handed, lacking in the aforementioned elegance of execution. I demonstrate how to integrate neural networks into a computer vision app. I hope that, through this example, you’ll develop a better understanding of when and how to apply machine learning.
Before we begin, here’s a link to my own Not Hotdog codebase.
Loading ML Models into OpenCV
Since OpenCV allows developers to load deep neural networks from popular frameworks (like Caffe2, Tensorflow, and Torch) through its dnn library, we can load a pre-trained image classification model from a framework of choice. Tensorflow’s Inception model is particularly refined as it accurately classifies roughly 1,000 classes, and it is quite fast.
To download the version of Inception that we’re using, click here. Unzip the “inception5h.zip” file and put its contents directly into your project in the directory of your choice.
Here’s the code for loading a neural net in OpenCV:
class_names_path = os.path.join(inception_path,’imagenet_comp_graph_label_strings.txt’)
model_path = os.path.join(inception_path, ‘tensorflow_inception_graph.pb’)
class_names_descriptor = open(class_names_path, ‘r’) class_names = class_names_descriptor.read().strip().split(‘\n’)
inception_net = cv2.dnn.readNetFromTensorflow(model_path)
return inception_net, class_names
Preprocessing For Inception
Because Tensorflow’s Inception only takes images formatted in a very specific manner, we need to pre-process our images using more traditional OpenCV functionalities.
1. Resize the Image: Many modern neural networks consume images of exactly 224x224 pixels, a characteristic shared by Inception. We need to resize the image as close as possible to 224x224, then pad any remaining space with white. For example, an image may end up looking something like this:
2. Blob It: BLObs, or Binary Large Objects, are the serialized inputs that Inception takes as input. We can convert an image to a BLOb using the following OpenCV command, where “resized” is the image after processing:
blob = cv2.dnn.blobFromImage(resized, 1, (224, 224), (0,0,0))
Usually, image classifiers operate only on images that are regularized in a particular manner. To fulfill this constraint, most people that use classifiers have to regularize the colors in whatever dataset they’re classifying. Since Inception regularizes its input, we won’t be modifying any images this way.
Time to Classify
Since we’ve set up Inception and formatted its input correctly, it’s time to classify! Normally, we’d have to train the model ourselves. However, Inception is fully trained, and one forward pass through the network produces accurate confidence intervals to work with. Filtering through these confidence intervals, it’s possible to determine the confidence interval in which the image is or is not a hot dog. The histogram below demonstrates this principle: the neural network produces a probability distribution in which the biggest bar on the graph corresponds to which kind of object the image belongs to.
Bells and Whistles
Finally, we have to produce an output image. Using OpenCV provisions for drawing, it’s possible to produce an image that looks like this:
I hope that in the process of building this classifier, you’ve developed a better understanding of how to design machine learning applications. If you have questions or comments, feel free to leave a comment here. You can also find me on Linkedin here.
Ben is a Computer Science Student at The University of Waterloo. He has been working as an Agile Software Engineer (Co-op) at TribalScale, where he has built projects written for Android and Node.js. Occasionally, he can be found on desktop screensavers.
TribalScale is a global innovation firm that helps enterprises adapt and thrive in the digital era. We transform teams and processes, build best-in-class digital products, and create disruptive startups. Learn more about us on our website. Connect with us on Twitter, LinkedIn & Facebook!