The Spam Diaries

News and musings about the fight against spam.
 by Edward Falk

Tuesday, October 02, 2007

Capchas used to digitize old books — killing two birds with one stone

Here's something wonderful from Carnegie Mellon: Scans from old books that prove too difficult for OCR software to decode are used as capchas. Users solve the capcha to prove they're human, and help digitize old books at the same time.


Anonymous Anonymous said...

Sounds good in theroy, but don't Capchas need to be solved before they are used as Capchas? But I'm sure they thought of that already.... While I see the point of using the page scans after a human solved it, but using the Capchas process to digitize old books doesn't seem possible... imo...

10:36 AM  
Blogger Spam Diaries said...

Yes, that was one of my first thoughts too: what good is a capcha if you don't know if the user provided the right answer or not?

However, the details are in the article. The user is provided with two capchas, one of which is a scan from a book and the other is generated from a known word. The user has to solve both. If the user gets the known one right, it's assumed they answered the book scan honestly.

Finally, book scans are presented to multiple people before being accepted as correct.

3:04 PM  

Post a Comment

<< Home