Google acquires reCAPTCHA for book scans

Google has acquired reCAPTCHA, the company known by most users as a provider of those (slightly annoying) tests where you have to type out the squiggly, morphed words displayed to sign in to a site. The idea is to prevent bots from buying all the tickets for a show in the first 10 seconds of the sale or signing up for every available email address.

Google says reCAPTCHA currently guards over 100,000 Web sites from such spam attacks.

The service has much broader applications, though.

reCAPTCHA is aiding the massive task of digitizing books, newspapers and old time radio shows. For physical books, itâ€™s a two-step process: scan a page, then transform into text using “Optical Character Recognition” (OCR).

Unfortunately, even the most sophisticated OCR program cannot easily transcribe just any scanned image of a page of text, for example, because in some older books, either time has taken its toll on the paper and ink or the font is just plain weird. But humans can probably figure out what it means.

According to reCAPTCHAâ€™s Web site:

About 200 million CAPTCHAs are solved by humans around the world every day. In each case, roughly ten seconds of human time are being spent. Individually, that’s not a lot of time, but in aggregate these little puzzles consume more than 150,000 hours of work each day.

reCAPTCHA gives users two words. The first is a word reCAPTCHA knows. The second is the word from that ancient or damaged text that the computer is trying to transcribe. If a user gets the first word right, then reCAPTCHA assumes itâ€™s dealing with a human, and accepts the userâ€™s input for the second word. After many run-throughs with many different users, reCAPTCHA pools all the inputs for the second word and assumes the majority answer is probably what the word actually is.

In this way, reCAPTCHA can continually utilize the crowd to correct and improve its OCR.

Googleâ€™s acquisition of the company makes a lot of sense, considering that they are currently invested in two large-scale digitization projects: Google Books and the Google News Archive.

Google acquires reCAPTCHA for book scans

Tags:

Ronny Kerr

Support VatorNews by Donating

Read more from related categories

1000-Likes.com Celebrates 15 Years of Helping Creators Grow Organically

Why Gen Z Is Leaving Google for TikTok: What This Means for Brands in 2025

Clapper Introduces Smart Discovery Feed to Help Creators Reach the Right Audience Faster

Related News

Crowd sourcing picks up steam

Facebook experiments with product Prototypes

Google launches “Fast Flip” for online news

Subscribe to Our Newsletter

Follow Us

Quick Links

Company

Google acquires reCAPTCHA for book scans

Tags:

Share This Story, Choose Your Platform!

Ronny Kerr

Support VatorNews by Donating

Read more from related categories

1000-Likes.com Celebrates 15 Years of Helping Creators Grow Organically

Why Gen Z Is Leaving Google for TikTok: What This Means for Brands in 2025

Clapper Introduces Smart Discovery Feed to Help Creators Reach the Right Audience Faster

Related News

Crowd sourcing picks up steam

Facebook experiments with product Prototypes

Google launches “Fast Flip” for online news

Subscribe to Our Newsletter

Follow Us

Quick Links

Company