How machine learning unlocks the value of video

Sandeep Casi · May 10, 2017 · Short URL:

New technology allows for deeper insights into video, driving viewers from inspiration to action

Digital video consumption has surged, igniting new monetization opportunities for these modern distribution outlets and content makers, a point I explored in a previous essay. The challenge I didn’t touch on, however, is how to optimize these monetization events in the most efficient manner. Machine learning is at the root of the answer.

Machine learning is when computers learn and analyze new data without being programmed. It’s when software can recognize patterns and draw conclusions. Object recognition utilizes the same principles - learning, recognition, etc. - in video and images.  

The machine learning zeitgeist might have us think machine learning of video is far advanced. After all, self-driving cars use video to know how to drive. But what about in the context of advertisements and retail?

Here’s a look at how advanced today’s software is in understanding what’s happening in video. Importantly, how this translates to driving more value out of video, and driving a viewer from inspiration to action.

Relevance to the viewer

Perfecting object recognition in video is the Holy Grail for advertisers as they attempt to match their ads with relevant content so their brand’s value is appreciated. It’s why you’ll never see advertisements for diapers when you’re watching a football game and why you’ll likely see ads for Viagra when you’re watching evening dramas or sitcoms. The viewers who aren’t fast-forwarding through the ads or ignoring them are possibly perking up to them.

Yet with the massive amounts of video produced daily, including 300 hours of video uploaded every minute on YouTube, it’s become a nearly impossible hurdle to scale the matching of ads with content.

To wit, Google found itself in boiling water for allowing ads from consumer brands, such as Johnson & Johnson and McDonald’s, to be shown on YouTube videos with terrorist content. This costly gaffe clearly underscores the importance of having better tools to intelligently recognize what’s in a video at scale.

If billions of videos require processing in seconds, advancements in machine learning and artificial intelligence are critical to avoid these types of embarrassing, high-priced incidents.

“Advancements in machine learning around video should start with the goal of understanding what is relevant to a viewer, which is a multi-faceted challenge,” said Kieran Farr, Senior Director of Business Development at Brightcove.

“Today, I’m seeing the current crop of solutions provide a machine-based transcription of the audio, or machine-based tagging or identification of objects in the video,” he said, suggesting this information in and of itself isn’t as valuable as when it’s combined with other inputs.

“The theory is that if I have a better understanding of this content and then marry that with data using social tools, I now have a better sense of who my viewer is,” he explained, adding, “It’s a bit of an art form and involves more than combining two things together and hoping they work.”

Beyond the object, knowing the narrative

Combining data from multiple sources makes for a holistic and relatively accurate profile of a viewer, but there’s still the challenge of fully understanding context, or knowing the narrative in the video.   

Images can be identified, thanks to the metadata around it. But they’re just pieces of information, which don’t always give full context of the story around it.

For example, machine learning might recognize a cruise ship in a video, and consider it appropriate for a vacation tour ad. The problem is the software can’t tell if that video is about a family having a nice vacation, or if it’s about everyone on the ship getting food poisoning. If it’s the latter, that’s not the kind of content an advertiser would want to be associated with. It’s the Google problem all over again.

To solve this issue, the software needs a lot of data in order for it to eventually be able to go deeper than just being able to identify a certain object. Companies like Hearst are working on machine learning algorithms that tap into this wealth of data, according to Allen Duan, SVP of Corporate Technology at Hearst Corporation.

Machine learning will need to recognize context as well. That is coming, albeit slowly, thanks to better algorithms. Machine learning is taught by the reference data set. They get information fed back to them, and that allows them to do a more effective job of classifying, which brings better outcomes.

Once machine learning gets to the point where it can recognize objects, and put them into context, there will be another issue to contend with to really satisfy what advertisers are going for: the actual sale of the product they’re marketing.

While the technology that allows for identifying objects and products is becoming increasingly good, solving the problem of being able to recommend products from a video still remains.  

No one has found a good balance and a solution that’s really compelling, one that doesn’t disrupt the customer experience and flow.

How video is changing retail

Consummating an online sale has a lot to do with external technologies outside of machine learning, but it doesn’t mean advances in understanding video isn’t changing our offline retail experience.

In-store cameras can also recognize objects and use that information to help convert sales. This, of course, comes with its own unique set of issues.

There’s a range of ways that video can enhance the offline purchasing experience. On the low end, a customer might stand in front of a screen with a display unit. Once the scanner recognizes the person in front of, it can send a video about an article of clothing it thinks they would want to buy.

“This is a very simplistic approach to using video for generic product ads - it is pragmatic and economical, but does it really translate into a sale, or yield to a higher conversion? A more intelligent approach would apply ML and link to a customer’s app profile, to offer the customer help, trigger a ‘personalized’ product ad based on location or amount of time spent in a specific spot, or even link to a past purchase history and provide adjacent upsell opportunities of product groupings and recommendations, as well as alert the retailer to – through ML predictive models – to future demand based on sales conversion,” said Ali Dalloul, General Manager of retail intelligence at Microsoft.

On the higher end is for that same sensor to see where the person is looking, known as “gaze tracking,” something that Dalloul said is much more complex and effective.

“To me, video is a piece of the bigger puzzle. It’s part of a computer vision problem, and it’s also part of an image analysis problem, so when you talk about machine learning, one of the things you have to have in any kind of machine learning model, especially when you start applying Deep Learning neural nets to the image analysis problem, is a lot of data, something which retailers have plenty of, but is not being used effectively” he said.

“Over time, the machine will learn in the same way a human learns, in an ‘unsupervised’ manner. Right now, most of the ML trained models are through ‘supervised’ data labeling approaches, where human intervention is still needed (to label the data). The promise of deep learning (a subset of ML) is that you can mimic the human biological brain, by building a deep neural network of multiple layers and connections where data flows. This requires ‘massive’ datasets, and huge computing power; and while we don’t really know why neural nets behave the way they do, they do produce a level of intelligence and insights that comes close to or sometimes surpasses human abilities.”

The ultimate goal is to start making meaningful insights out of that raw data. Machine learning will recognize a set of patterns about a specific consumer, really knowing what their preferences are, so when that person looks at the screen, it can make the right recommendation.

“We are heading in that direction. It’s a no-brainer that’s where we’re going,” said Dalloul.

Personalized shopping experiences

We’ve only just begun to see the real power of video, and how it can inspire us and call us to action.

Machine learning is driving those innovations, and those advancements will play a major role in helping to make video more effective -- by knowing the content and context as well as the person watching it.

Video will know who we are, what we like and how to best sell to us.

(Steve Loeb, Keith McCurdy and Bambi Roizen contributed to this piece)  

(Image source: 

Image Description

Sandeep Casi

Founder & CEO Videogram; Virtual Reality at General Motors, Systems Lead at Industrial Light & Magic(Lucasfilm), Research Scientist at Fuji Xerox Palo Alto lab, Digital Cinema at Fujifilm

All author posts

Support VatorNews by Donating

Read more from our "Thought Leadership" series

More episodes

Related Companies, Investors, and Entrepreneurs



Joined Vator on

Brightcove is an Internet TV platform.

We're dedicated to harnessing the inherent power of the Internet to transform the distribution and consumption of media.

Brightcove empowers content owners—from independent producers to major broadcast networks—to reach their audiences directly through the Internet. At the same time, we help web publishers enrich their sites with syndicated video programming, and we give marketers more ways to communicate and engage with their consumers.

Most importantly, we give people the freedom to easily find, watch and participate in a broad range of video content—when and where they choose.



Joined Vator on

Videogram is a HTML5 Social Video Platform. 

Videogram automatically analyzes any video file and generates a visually engaging summary with different-sized keyframes arranged in a grid reminiscent of a comic book.

Videogram allows consumers to get a quick overview of an entire video's contents without watching the entire video from beginning to end. Viewers can start – and share – the video from their point of greatest interest, thus eliminating the pain points of streaming / buffering a large video on congested mobile networks.

Videogram is compatible with any video server / player and leaves the original video undisturbed (along with any existing copyright or advertising). And the technology is completely embeddable. Videogram can live on publisher / brand web pages, within apps, or on social networks.


Steven Loeb

Joined Vator on


Sandeep Casi

Joined Vator on

Founder & CEO Videogram; Virtual Reality at General Motors, Systems Lead at Industrial Light & Magic(Lucasfilm), Research Scientist at Fuji Xerox Palo Alto lab, Digital Cinema at Fujifilm

Keith McCurdy

Joined Vator on

Adviser at Advsr, entrepreneur, board member, executive.