Web-reading robot Diffbot raises $2M from tech titans

Faith Merino · May 31, 2012 · Short URL: https://vator.tv/n/271e

The visual intelligence technology reads Web pages the same way humans do

All I want is a robot that can read my emails to me while I drive.  Is that so much to ask?  Actually, I’m sure there’s already an app for that and I’m exposing myself for the lazy technology journalist I really am.

Visual artificial intelligence startup Diffbot announced Thursday that it has raised $2 million in a round of funding led by Earthlink founder Sky Dayton, Sun Microsystems co-founder Andy Bechtolsheim, MIT Media Lab director Joi ito, YouSendIt CEO Brad Garlinghouse, a number of executives from Facebook, Twitter, and Yahoo, as well as Matrix Partners. 

The company, which launched out of beta last August, has developed a visual intelligence robot that can essentially read Web pages the way humans do.  The technology can look at Web pages written in any language and identify whatever an app developer wants it to.

For example, one app developer at Hackathon used the technology to create an app that could read online content aloud for blind users.  Existing apps for the blind simply read from top to bottom and don’t differentiate between the content and ads, copyright credits, and so on.  But Diffbot’s technology was used to create a content reading app that can identify the relevant content (i.e. headline, author, body of the text, etc.).

And all this time I’ve been reading my own online content like a chump.

That’s just one of the many uses to which Diffbot can be applied.  The technology is the brainchild of Stanford Ph.D. student Mike Tung, who developed the robot to monitor his classes’ Web pages.  When a Web page would be updated with a new assignment, for example, Mike’s cell phone would buzz to alert him.  When he opened up the technology to his friends, they used it to monitor other websites, like job sites.

Explaining how the robot differentiates between relevant content and the junk content, Mike Tung told me that most Web pages follow similar styles in terms of layout.  “Diffbot looks at the height and width of the page.  Ads usually have a common format and are usually positioned in certain ways across the page,” he said.  Diffbot is designed to identify these visual features and discriminate between relevant content and all else.

When Diffbot launched in August, it could identify two types of Web pages: article pages and front pages.  But Tung says that the entire Web can be effectively broken down into 18 types of Web pages, and the new funding from this round will be used to scale the technology out to identify those pages—such as video pages, people pages, reviews, products, photo galleries, and more.

The company also announced that it's now processing 100 million API calls per month.

The San Francisco-based company currently has five full-time employees.

“We chose investors who were experienced with scaling out big Internet-sized companies,” said Tung.

Support VatorNews by Donating

Read more from our "Trends and news" series

More episodes