“Move fast and break things” is OK for some things, but probably not as OK for your financial healthRead more...
Currently on Kickstarter, these OpenCV kits bring computer vision to a Raspberry Pi-like machine
When the small, single-board Raspberry Pi first hit the scene in 2012, lots of people were excited. What started as a computer project meant to teach children and young adults about computers, grew into something much bigger.
Programmers, hobbyists, and businesses started figuring out ways that the technology onboard could be used for a nearly endless amount of things.
Currently on Kickstarter, the OpenCV AI kit has two different models, one with embedded AI and computer vision (OAK-1) and one with Spatial AI and computer vision (OAK-D). They both come with 4K 12MP cameras and everything you'll need directly on the small board.
From facial and vehicle recognition to identifying objects, users can use a variety of free neural nets to create applications, or for those more comfortable with the process, they can even create their own to work with the kits.
I had the chance to speak with Brandon Gilles of Luxonis to learn more about the framework that powers the OpenCV kits, as well as the inspiration behind the company. Check out the full interview below.
Care to introduce yourself and your role with Luxonis?
My name is Brandon Gilles and I'm the Founder and CEO of Luxonis.
What inspired the creation of the company?
So actually a mentor of mine leaving the company I was working for is what planted the seed for actually starting the company. We both loved our jobs and times were going great... our products were becoming increasingly popular and we were beating competitor companies where were 10x fold our size. So it was a total shock that he was leaving. In asking him why, he explained to me that artificial intelligence, machine learning, deep learning, etc. was about to change every industry.
He literally said it was the biggest opportunity of his career. And this was a guy who had accomplished things I'd dream to accomplish... started from scratch several $100mm companies, mentored who would go on to be the youngest billionaire in the world. So for him to see this as the biggest opportunity in HIS career blew my mind.
At the time, I knew nothing about AI besides a project a roommate in college did... remembering conversations with that roommate and him describing the state of AI then (in 2004) and how it was 'useless'. So up until my mentor leaving in 2016, that was my mental model.
After that conversation, it was like opening Pandora's box, I started digging in and discovered the crazy capabilities that were now possible with machine learning, which apparently had started circa 2012 but I had completely missed. So I spent the next year coming up to speed on all that I had missed, and the crazy new innovations and breakthroughs that were occurring monthly.
Fast-forward about a year into 2017, and I finally took the plunge to do what I had always wanted to do: start my own business. And having learned all the crazy things AI-enabled, and having a specialty in embedded systems I was determined to start a business around embedded AI and CV. I had ideas, but they were nebulous... I more or less lept before I looked and figured out I'd discover what I should do by doing.
So while doing this 'learning by doing' discovery process, something I never could have anticipated happened:
What felt like everyone I knew was getting hit by distracted drivers - while riding their bikes to and from work. One was killed, 3 were critically injured (including broken backs, broken femurs, and shattered hips), and among 1 of these was also a traumatic brain injury.
So this had a huge impact on me. And given that I had spent over a year learning the state of art on all the AI could do - including reading medical charts better than doctors, picking out cancer for medical imagery better than the best doctors, and surpassing people in every metric for the perception of the world - I thought there had to be a way to leverage this power to keep people safe.
So this focused me on solving this problem. It served multiple purposes... it was a problem worth solving, I felt passionate about it, and it also was extremely challenging and so it seemed like solving it would produce technology that could solve all sorts of other problems.
So is this like a Raspberry pi but for computer vision?
Yes. For computer vision and spatial AI. Where the spatial AI part is key. Humans have two eyes so we can perceive the world in real-time. Our intelligence tells us what objects are. Our two eyes give us location information.
This is the first platform that combines these two things into a small, embeddable system. So for industry-specific applications, embedded systems can now approach human-like perception. An example of which is picking bad onions off of a conveyor belt... this has been a classically impossible problem for computer vision, but humans can do it easily. Now, this sort of previously-impossible problem is solvable using this spatial AI platform.
Is OpenCV (OAK) meant for hobbyists (like Raspberry Pi) or is it intended for professionals in AI/machine learning/etc?
Our goal is to make it so easy to use that an artist who has no programming experience can make an interactive sculpture with it. For example, a sculpture in a public place that mimicks people in a playful way. We want to make it democratized to that level of ease of use. And in fact, we do have a customer who did this, and we got a note back about how they had never programmed and they were able to make this happen. Which was extremely valuable.
But this is designed for productization. Our mental model on it is discoverability: Make the system just work, so that the crazy power and flexibility doesn't get in the way of it 'just working' right away, but is an inch under the surface once it's needed for customization for application-specific needs.
So the system just boots up and works with no user intervention, and with simple instructions the user can run any of the modern popular neural models and hardware-accelerated computer vision functionalities. But just below this surface is an extensive and extremely adaptable API in C++ and Python that works on nearly anything.
So what this does is allows a company, or even an individual who works for that company, to buy it on a Friday, prototype a solution over a weekend, demo it to her or his boss on Monday. And then once this demo is working with very little work, and the concept is validated, then the user can discover the modular nature, the flexibility, etc. that allows them to productize.
So this is 100% to enable people to build products - and everything was architected to enable that. But a key part of modern product design is fast iteration - so it's also so easy that a hobbyist can get it up and running in under a minute.
What are some real-world use cases of these systems?
So effectively every industry will be changed by these sorts of systems in the next decade. Prior to the Kickstarter, we had the following industries building solutions off of OAK-D:
- Health and Safety
- Food processing
- Mining equipment
- Human-Machine Interaction
- Remote Monitoring
- Automatic (Sports) Filming and Analytics
So each one of these categories has a bunch of sub-categories, each of which having a ton of detail. So let's just pick the last one, automatic sports filming and analytics.
So with OAK-1 for example, the system can combine AI and traditional computer vision to automatically losslessly zoom at up to 12x to follow the action in say a football game. So for filming youth sports we've gotten probably 20-30 customers who've reached out about this application.
It's super valuable as this allows you to place one OAK-D and some storage media, and you get up to 12x zoomed-in HD footage of just the action as an output. Great for highlights, reviews, etc. and a million times better than HD content where the action is in some corner of the footage and you can hardly even make out what's happening or where the ball is (as is the case with just a normal, fixed-install HD camera).
And this is what I mean by being able to mimic the capability of a human in myopic ways. In this case, it's mimicking the capability of a human to know where the action is in a sport, and follow it, zoom in, track it across the field etc. - to produce good footage.
Now, will this be better than a pro cameraperson? Not even close. But it will be way better than the average Dad holding an iPhone trying to film a game. And that's really valuable for parents, coaches, students, and even athletic programs for funding their programs by selling highlights.
And across effectively every industry you find this sort of thing... being able to do w/ an embedded system what used to take a full person. And in MOST applications the problem wasn't solvable with a human, as you couldn't cram a human into the 2" x 2" box in some super-hazardous conditions... so it allows solving problems which just weren't solvable before.
So these sorts of systems are going to transform every industry... it's an industrial revolution again, enabling versatile automation as never before possible.
There are two different versions, correct? What are the differences between the two?
Yes, OAK-1 and OAK-D. So OAK-D has been by far the most popular, which is validating to us as we made all of this to enable Spatial AI, and it's what allows it.
Not all problems require spatial awareness, so we made OAK-1 to address those needs. But the spatial AI part is what was most-direly missing the market, and we've seen that reflected in the orders.
So in short:
- OAK-1: Embedded AI and CV
- OAK-D: Embedded Spatial AI and CV
So all of the machine learning and AI is onboard, correct? Was that a challenge?
Great question. So yes, that was nearly 100% of the work here. So existing solutions used the same chipset (Myriad X) but forced the computer connected to it to do a ton of the work. So this made it so you needed a super-fast computer for the overall system to perform well at all. With both OAK-1 and OAK-D all the processing is onboard, so it enables doing all this AI, depth sensing, and 4K video recording on something small and low power like a Raspberry Pi Zero.
It was a huge challenge and required a bunch of time brainstorming and inventing techniques for re-using internal high-speed caches, inventing inter-frame caching techniques, and excruciating timing exercises. We're talking about 10gbps of data flowing into 128KB of RAM and a slew of computer vision being performed on it, while keeping timing and real-time. Figuring out how to fit all of this running in parallel was hugely difficult and relied heavily on the experience of a team that has countless experience building products off of the Movidius platform. But even given that, this is the most they'd ever stretched the architecture and their own know-how.
Many with experience with the platform thought what we were doing was intractable... and only believed it when they saw it running.
Anything you'd like to close with?
We've received a TON of benefits from the open-source community in innumerable small ways which added up to a huge impact throughout our trajectories, both personally and professionally. So we're excited to finally produce an open-source product that we feel materially gives back.
If you'd like to learn more about the OpenCV kits, make sure to check out the Kickstarter here.
Read more from our "Interviews" series
A look at how the company helps businesses secure loans through its network of lendersRead more...
A look at how the company helps creators and esports orgs build audiences and harness that powerRead more...