It would have been easy for a show like HBO’s Silicon Valley to show off a machine learning app that classifies hotdogs with some clever post processing — drop in some fake static screenshots and call it a day. If the team wanted to create a real app for branding purposes they could have strung together a few APIs in hackathon fashion and moved on to the next silly gag. But to their credit, Tim Anglade, the engineer behind the viral spoof app Not Hotdog, probably put more thought thought into his AI than at least one “AI” startup to pitch on Sand Hill this week.
Rather than use something like Google’s Cloud Vision API, Anglade actually got into the weeds, experimenting with TensorFlow and Keras. Because Not Hotdog had to run locally on mobile devices, Anglade faced a slew of timely challenges that any machine learning developer exploring applications on mobile could relate to.
In a Medium post, Anglade discusses how he initially got to work retraining the Inception architecture with transfer learning on a few thousand images of hotdogs using an eGPU attached to his laptop. But even still, his model was too bloated to run reliably on mobile devices.
So he tried using SqueezeNet, a leaner network that would require far less memory to run. Unfortunately, despite its compact size, its performance was hampered by over and underfitting.
Even when given a large dataset of hotdog and not hotdog training images, the model wasn’t quite able to grasp the abstract concepts of what generally constitutes a hotdog and instead seemed to use bad heuristics as a crutch (red sauce = hotdog).
Fortunately, Google had just published their MobileNets paper, putting forth a novel way to run neural networks on mobile devices. The solution presented by Google offered a middle ground between the bloated Inception and the frail SqueezeNet. And more importantly, it allowed Anglade to easily tune the network to balance accuracy and compute availability.
Anglade used an open source Keras implementation from GitHub as a jumping off point. He then made a number of changes to streamline the model and optimize it for a single specialized use case.
The final model was trained on a dataset of 150,000 images. A majority, 147,000 images, were not hotdogs, while 3,000 of the images were of hotdogs. This ratio was intentional to reflect the fact that most objects in the world are not hotdogs.
You can check out the rest of the story here, where Anglade discusses all of his approach in detail. He goes on to explain a fun technique for using CodePush to live-inject updates to his neural net after submitting it to the app store. And while this app was created as a complete joke, Anglade saves time at the end for an insightful discussion about the importance of UX/UI and the biases he had to account for when during the training process.