Thought Leadership


How machine learning unlocks the value of video

New technology allows for deeper insights into video, driving viewers from inspiration to action

Innovation series by Sandeep Casi
May 10, 2017
Short URL:

Digital video consumption has surged, igniting new monetization opportunities for these modern distribution outlets and content makers, a point I explored in a previous essay. The challenge I didn’t touch on, however, is how to optimize these monetization events in the most efficient manner. Machine learning is at the root of the answer.

Machine learning is when computers learn and analyze new data without being programmed. It’s when software can recognize patterns and draw conclusions. Object recognition utilizes the same principles - learning, recognition, etc. - in video and images.  

The machine learning zeitgeist might have us think machine learning of video is far advanced. After all, self-driving cars use video to know how to drive. But what about in the context of advertisements and retail?

Here’s a look at how advanced today’s software is in understanding what’s happening in video. Importantly, how this translates to driving more value out of video, and driving a viewer from inspiration to action.

Relevance to the viewer

Perfecting object recognition in video is the Holy Grail for advertisers as they attempt to match their ads with relevant content so their brand’s value is appreciated. It’s why you’ll never see advertisements for diapers when you’re watching a football game and why you’ll likely see ads for Viagra when you’re watching evening dramas or sitcoms. The viewers who aren’t fast-forwarding through the ads or ignoring them are possibly perking up to them.

Yet with the massive amounts of video produced daily, including 300 hours of video uploaded every minute on YouTube, it’s become a nearly impossible hurdle to scale the matching of ads with content.

To wit, Google found itself in boiling water for allowing ads from consumer brands, such as Johnson & Johnson and McDonald’s, to be shown on YouTube videos with terrorist content. This costly gaffe clearly underscores the importance of having better tools to intelligently recognize what’s in a video at scale.

If billions of videos require processing in seconds, advancements in machine learning and artificial intelligence are critical to avoid these types of embarrassing, high-priced incidents.

“Advancements in machine learning around video should start with the goal of understanding what is relevant to a viewer, which is a multi-faceted challenge,” said Kieran Farr, Senior Director of Business Development at Brightcove.

“Today, I’m seeing the current crop of solutions provide a machine-based transcription of the audio, or machine-based tagging or identification of objects in the video,” he said, suggesting this information in and of itself isn’t as valuable as when it’s combined with other inputs.

“The theory is that if I have a better understanding of this content and then marry that with data using social tools, I now have a better sense of who my viewer is,” he explained, adding, “It’s a bit of an art form and involves more than combining two things together and hoping they work.”

Beyond the object, knowing the narrative

Combining data from multiple sources makes for a holistic and relatively accurate profile of a viewer, but there’s still the challenge of fully understanding context, or knowing the narrative in the video.   

Images can be identified, thanks to the metadata around it. But they’re just pieces of information, which don’t always give full context of the story around it.

For example, machine learning might recognize a cruise ship in a video, and consider it appropriate for a vacation tour ad. The problem is the software can’t tell if that video is about a family having a nice vacation, or if it’s about everyone on the ship getting food poisoning. If it’s the latter, that’s not the kind of content an advertiser would want to be associated with. It’s the Google problem all over again.

To solve this issue, the software needs a lot of data in order for it to eventually be able to go deeper than just being able to identify a certain object. Companies like Hearst are working on machine learning algorithms that tap into this wealth of data, according to Allen Duan, SVP of Corporate Technology at Hearst Corporation.

Machine learning will need to recognize context as well. That is coming, albeit slowly, thanks to better algorithms. Machine learning is taught by the reference data set. They get information fed back to them, and that allows them to do a more effective job of classifying, which brings better outcomes.

Once machine learning gets to the point where it can recognize objects, and put them into context, there will be another issue to contend with to really satisfy what advertisers are going for: the actual sale of the product they’re marketing.

While the technology that allows for identifying objects and products is becoming increasingly good, solving the problem of being able to recommend products from a video still remains.  

No one has found a good balance and a solution that’s really compelling, one that doesn’t disrupt the customer experience and flow.

How video is changing retail

Consummating an online sale has a lot to do with external technologies outside of machine learning, but it doesn’t mean advances in understanding video isn’t changing our offline retail experience.

In-store cameras can also recognize objects and use that information to help convert sales. This, of course, comes with its own unique set of issues.

There’s a range of ways that video can enhance the offline purchasing experience. On the low end, a customer might stand in front of a screen with a display unit. Once the scanner recognizes the person in front of, it can send a video about an article of clothing it thinks they would want to buy.

“This is a very simplistic approach to using video for generic product ads - it is pragmatic and economical, but does it really translate into a sale, or yield to a higher conversion? A more intelligent approach would apply ML and link to a customer’s app profile, to offer the customer help, trigger a ‘personalized’ product ad based on location or amount of time spent in a specific spot, or even link to a past purchase history and provide adjacent upsell opportunities of product groupings and recommendations, as well as alert the retailer to – through ML predictive models – to future demand based on sales conversion,” said Ali Dalloul, General Manager of retail intelligence at Microsoft.

On the higher end is for that same sensor to see where the person is looking, known as “gaze tracking,” something that Dalloul said is much more complex and effective.

“To me, video is a piece of the bigger puzzle. It’s part of a computer vision problem, and it’s also part of an image analysis problem, so when you talk about machine learning, one of the things you have to have in any kind of machine learning model, especially when you start applying Deep Learning neural nets to the image analysis problem, is a lot of data, something which retailers have plenty of, but is not being used effectively” he said.

“Over time, the machine will learn in the same way a human learns, in an ‘unsupervised’ manner. Right now, most of the ML trained models are through ‘supervised’ data labeling approaches, where human intervention is still needed (to label the data). The promise of deep learning (a subset of ML) is that you can mimic the human biological brain, by building a deep neural network of multiple layers and connections where data flows. This requires ‘massive’ datasets, and huge computing power; and while we don’t really know why neural nets behave the way they do, they do produce a level of intelligence and insights that comes close to or sometimes surpasses human abilities.”

The ultimate goal is to start making meaningful insights out of that raw data. Machine learning will recognize a set of patterns about a specific consumer, really knowing what their preferences are, so when that person looks at the screen, it can make the right recommendation.

“We are heading in that direction. It’s a no-brainer that’s where we’re going,” said Dalloul.

Personalized shopping experiences

We’ve only just begun to see the real power of video, and how it can inspire us and call us to action.

Machine learning is driving those innovations, and those advancements will play a major role in helping to make video more effective -- by knowing the content and context as well as the person watching it.

Video will know who we are, what we like and how to best sell to us.

(Steve Loeb, Keith McCurdy and Bambi Roizen contributed to this piece)  

(Image source: 

Related companies, investors and entrepreneurs

Description: Brightcove is an Internet TV platform. We're dedicated to harnessing the inherent power of the Internet to transform the distribution and...
Description: Videogram is a HTML5 Social Video Platform.  Videogram automatically analyzes any video file and generates a visually engaging summa...
Bio: Founder & CEO Videogram; Virtual Reality at General Motors, Systems Lead at Industrial Light & Magic(Lucasfilm), Research Scie...
Bio: Adviser at Advsr, entrepreneur, board member, executive.

Featured Stories


Other episodes of this series

Taking your mobile app global


Thought Leadership

by Matt Raoul
The world is shrinking and the mobile audience is growing but going global isn't easy

What happens when everyone is a publisher?


Thought Leadership

by Steven Loeb
In a content-flush world, True Anthem’s AI delivers targeted distribution

How media companies engage a distracted...


Thought Leadership

by Matt Raoul
Aka how far into this article will you switch to puppy videos?

You thought raising Series A was hard? Here...


Thought Leadership

by Matthew Kropp
VC advice on best practices raising and deploying the Series B round

The big niche: Start small, think big


Thought Leadership

by Matthew Kropp
How marketplaces that begin with a narrow focus end up ruling the world