Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Google Docs' OCR Quality Tested

timothy posted more than 3 years ago | from the weighed-in-the-balance-and-found-wanting dept.

Android 99

orenh writes "Google has released a Google Docs application for Android, which includes the ability to create documents by OCR-ing photos. I tested the application's OCR quality and found that it's mediocre under the best conditions and poor under real-world conditions. However, I believe that this poor performance is caused in part by an intentional decision by Google."

cancel ×

99 comments

Sorry! There are no comments related to the filter you selected.

/b/ (1, Interesting)

stonewallred (1465497) | more than 3 years ago | (#35969648)

Since the standard practice on 4chan is to use the word niggers for any word in a recaptcha that has a punctuation mark, I question just how good the OCR is.

Re:/b/ (0, Troll)

stonewallred (1465497) | more than 3 years ago | (#35970180)

Lmao, flamebait? So speaking the truth is flamebait?

Nice to know that an easily verifiable fact is modded in such a fashion.

Re:/b/ (1)

Anonymous Coward | more than 3 years ago | (#35970668)

You said nigger, you referenced to 4chan and b/. Most mod don't take time to understand what it been told, they only lookup the keyword. Michael Kristopeit is right, slashdot is stagnated... LOL i am better post that as A. Coward.

Re:/b/ (1, Insightful)

Super Dave Osbourne (688888) | more than 3 years ago | (#35970762)

Slashdot has become formula boring. Quite a long time ago. This is verifiable, and not meant as flamebait. If the mods would stop acting like scripts without some AI built in for content /. would be once again a viable worthwhile place to contribute on a regular basis, rather than drive-bye train wreck contribution.

Re:/b/ (1)

stonewallred (1465497) | more than 3 years ago | (#35970818)

I usually keep 5 or 15 points for modding.

And I mod according to what they say the guidelines are.

But then again, I cruise here on raw and uncut simply because I want to see it all, and many good posts are hidden if you use filtering.

Guess I need to become a karma whore.

Re:/b/ (-1)

Anonymous Coward | more than 3 years ago | (#35971944)

Acting like 4chan is important is definitely flamebait. Modded accurately. Also modded troll accurately for arguing the point.

Re:/b/ (1)

Snaller (147050) | more than 3 years ago | (#35971092)

Trump? Is that you?

Re:/b/ (1)

zill (1690130) | more than 3 years ago | (#35971140)

You realize that recaptcha knows exactly which site the captchas come from, right? It would only take a single line of code to filter out all the noise from 4chan.

Re:/b/ (0)

Anonymous Coward | more than 3 years ago | (#35972346)

Your argument is invalid... /b/tards use the same trick on almost every site that throws them a captcha

Re:/b/ (0)

Anonymous Coward | more than 3 years ago | (#35973472)

read teh rulez o' da intarwez!

Google's OCR (1)

machinelou (1119861) | more than 3 years ago | (#35969710)

I've played around with Google's OCR framework (tesseract) and it is far from perfect. So, this isn't really a surprise.

Re:Google's OCR (1)

icebike (68054) | more than 3 years ago | (#35969800)

Its also far from new. Didn't they get that from some long dead Open Source project?

Re:Google's OCR (1)

RobertM1968 (951074) | more than 3 years ago | (#35971796)

I've played around with Google's OCR framework (tesseract) and it is far from perfect. So, this isn't really a surprise.

Its also far from new. Didn't they get that from some long dead Open Source project?

Answered in the order you mentioned each:

Yes, far from new (project started 26 years ago).

No, not long dead. Just "recently" (roughly 6 years ago, give or take) open sourced and ported/compiled for Linux, OS/2 (and other platforms I am sure).

Yes (open source project), and I think it was called... Tesseract. Kinda like the poster you responded to mentioned. ;-)

To save you the work, it was an HP/UNLV project, started in 1985, that was open-sourced in 2005. It is still available on SourceForge [sourceforge.net] .

Re:Google's OCR (1)

owlstead (636356) | more than 3 years ago | (#35969854)

Yeah, I was looking for an android OCR library, and that one was the only one that came up. Although there are a few other Linux options, none of those seemed to be right on the money either. This article is strengthening the already published reports on open source OCR software: basically, it's not performing all that well. I wish it was.

Re:Google's OCR (1)

ozmanjusri (601766) | more than 3 years ago | (#35972102)

This article is strengthening the already published reports on open source OCR software: basically, it's not performing all that well. I wish it was.

Now that it's getting some exposure, I'd say it'll be performing a lot better soon.

Nothing like being in the public eye for attracting clever people's attention.

Re:Google's OCR (2)

camperslo (704715) | more than 3 years ago | (#35969982)

I guess it'll be a little while before we'll see an app I'd wondered about. I thought it would be useful to be able to take snapshots of things like news reports (streamed on the web, El Gato Eye-TV domestic or satellite t.v., YouTube etc.) and do OCR on them, AND get an English translation of it. With the events so far this year, support for Japanese and Arabic languages would have been a good start.

Re:Google's OCR (1)

somersault (912633) | more than 3 years ago | (#35970078)

Definitely, weirdly I was wondering this afternoon if Goggles can already do OCR and translation on full pages of text.. I have a French book that I'd love to read, but I have basically no French!

Re:Google's OCR (1)

owlstead (636356) | more than 3 years ago | (#35970186)

With the top-notch translators that are around today, you may be able to get the gist of the book. But the chance that the translation of the book will be a joy to read is about zero, zip, nada, nothing. You'd better buy a good translation or, if that's not available, try and learn French (with the book itself as source material maybe).

Re:Google's OCR (1)

retchdog (1319261) | more than 3 years ago | (#35970442)

there was a paper about combining a (crappy) machine translation with low-skilled workers, who natively understand the target language, to patch up the glaring flaws. the idea is that _most_ of the errors made by the machine don't require understanding of the source language to detect. of course you lose out on anything 'deep' or artistic in the source language, and i would be hesitant to trust it for scientific papers or legal documents, but it's an interesting idea.

Re:Google's OCR (1)

ggeens (53767) | more than 3 years ago | (#35972514)

there was a paper about combining a (crappy) machine translation with low-skilled workers, who natively understand the target language, to patch up the glaring flaws.

I'm working on a project where the translations are handled like that. We send all texts to an external company, and a few hours later, they send back the translation. This seems to work relatively well.

The next phase involves immediate translation without human intervention. I'm curious as to how that will work out.

Re:Google's OCR (1)

ozmanjusri (601766) | more than 3 years ago | (#35972524)

there was a paper about combining a (crappy) machine translation with low-skilled workers

Even better, Distributed Proofreaders [wikipedia.org] is Project Gutenberg's version of just that. They've probably passed 20,000 books OCR'd and proofed by now.

Re:Google's OCR (1)

somersault (912633) | more than 3 years ago | (#35970840)

There are no translations available or I'd buy them. It's a book about Parkour by David Belle.. I'm just interested in basic history and his opinions rather than flowery language or whatever. If there is much discussion of technique it might be really hard to understand though - I auto-translated a French tutorial on rolling before, and it would just read as gibberish to someone who didn't already have a good idea of the technique.

Re:Google's OCR (1)

lxs (131946) | more than 3 years ago | (#35972582)

Or you could learn French if much of the literature in your field of interest is in that language. It isn't that hard if you're not interested in fluency. You also have gained a valuable skill.

Hey, it's more useful than either Elvish or Klingon.

Re:Google's OCR (0)

Anonymous Coward | more than 3 years ago | (#35974622)

Can you link to the book, maybe if it's on Amazon.fr or another site? I know high-school level French, so could likely get most of it. :)

Re:Google's OCR (1)

somersault (912633) | more than 3 years ago | (#35976686)

Here you go [amazon.co.uk]

It appears that there is a Facebook group where people are putting up translations of small parts of it now.

Better to scan to PDF (3, Interesting)

icebike (68054) | more than 3 years ago | (#35969858)

There are a number of scanner apps in the market that do a much better job in the first step of this process, which is taking the picture. They then concentrate their efforts on producing a clean usable PDF of the document. I tested one of these and found that the PDF rendered by it was much better than the PDF produced by Google. [android.com]
Everything is crisp and readable.

If the first fails, its no wonder the second OCR step fails.

Re:Better to scan to PDF (1)

X0563511 (793323) | more than 3 years ago | (#35970002)

And how do you plan on searching, indexing, or otherwise having an computer operate on the contents of that document?

Re:Better to scan to PDF (2)

icebike (68054) | more than 3 years ago | (#35970086)

Just how many of such documents do you expect to have to index taken with a cellphone? Seriously, this is a toy. Don't go all corporate archives on me here.

Re:Better to scan to PDF (1)

sortius_nod (1080919) | more than 3 years ago | (#35970200)

Even then, I have yet to work for a company that has a searchable PDF archive. Even when I worked for Fairfax (media company here in AU that publishes national & local newspapers), the PDF archive that came straight out of the publishing app wasn't searchable. Hell, it only had 3 months of the paper on servers, the rest were on archive DVDs.

The whole idea of searchable PDFs died a long time ago, this is why business use purpose built products.

Also, the OP stated that it was the original PDF that was generated better, the next step is to run OCR on the PDF. I have no idea what GP was on about, seems like they just wanted to post on this topic.

Re:Better to scan to PDF (0)

Anonymous Coward | more than 3 years ago | (#35970684)

The whole idea of searchable PDFs died a long time ago, this is why business use purpose built products.

I think it's hilarious that a business in the business of words can't search their own content. Lemme guess, the PDFs that came out of your publishing app were essentially TIFF images like you'd get from a fax receiver?

Searchable PDFs work just fine, thank you very much. Our mining business can mine our PDFs even easier than the resources in the ground.

Re:Better to scan to PDF (1)

afidel (530433) | more than 3 years ago | (#35971412)

We OCR everything that's scanned into our document management system, search would be basically impossible without it since relying on users to accurately enter metadata is suicidal if you want useful data.

I work on this (1)

KingAlanI (1270538) | more than 3 years ago | (#35972404)

My job entails working with our office's document management system to manually enter metadata.
In part, I essentially end up parsing the data which users entered in various formats.
However, since the original form is entered electronically to begin with, I figure this could be a lot more automated. (The people in my office definitely have a clue; however, fat chance moving this up through the bureaucracy.)

Re:Better to scan to PDF (1)

pruss (246395) | more than 3 years ago | (#35971786)

Searchable pdfs are not dead. For instance, jstor.org's large repository of scholarly journals is searchable pdfs. jstor is very heavily used in my field. Not perfect, but pretty good.

Re:Better to scan to PDF (1)

X0563511 (793323) | more than 3 years ago | (#35970354)

Just how many of such documents do you expect to have to index taken with a cellphone? Seriously, this is a toy. Don't go all corporate archives on me here.

Well, that's the whole point to OCR. If you're just scanning, then you're just scanning. OCR'ing lets you do all kinds of text processing, analysis, format shifting etc. A scan is... just a picture of a document. Makes me think of microfiche.

Re:Better to scan to PDF (2)

icebike (68054) | more than 3 years ago | (#35970554)

True, but again, this is a cell phone app. You don't expect document management system level capabilities, especially not in release 1.0.
If you want that level of quality you bring something more than a cell phone to the task. Maybe a flatbed or something.

My point here is this: I've had much better luck going direct to PDF On the phone than via Google Docs.

Try this test if you have a Google Docs account, (even a free one):

Upload some PDF, even one created using something on your phone like CamScanner. [appbrain.com] .
Then, once you have a document in Google Docs, select it and from the menu choose Make a Google Docs Copy. It will OCR it for you.

Now if you uploaded a quality PDF (say something scanned to pdf directly from your scanner) the OCR will be close to flawless.
But even those shot with the camera and cleaned up by CamScanner will be better than the ones created directly in Google Docs on the android, probably for some of the reasons mentioned in TFA.

Re:Better to scan to PDF (0)

X0563511 (793323) | more than 3 years ago | (#35971290)

I don't touch Google anything, except for email. I much rather use -real- solutions, with my nice flatbed etc :)

It is odd that your phone does that better than Google...

Re:Better to scan to PDF (1)

AmiMoJo (196126) | more than 3 years ago | (#36021802)

This is a Google product. They like to release early and do public betas lasting years, so expect rapid improvements.

There seems little point in reviewing a new Google product until it has matured somewhat because the first version is always half done sort-of-works quality code. The first version of Android typed everything entered into the phone into a hidden root shell for crying out loud. About the only area they seem to hold off in is the front page of their search engine.

Re:Better to scan to PDF (0)

Anonymous Coward | more than 3 years ago | (#35970504)

You missed his point (although he muddied it a lot by talking about PDFs specifically.) A low-res picture is hard to OCR. The computer doesn't have as much detail to draw conclusions with. He will of course OCR the high-res scan in order to make it indexable. But it will have a much better chance of actually recognizing text rather than reading "01~`........!!4g"

No good free solutions (2)

CajunArson (465943) | more than 3 years ago | (#35969864)

The end of the article is pretty telling. Basically any professional OCR software from the mid 1990's and normal consumer grade commercial software from today is lightyears ahead of open source solutions. Which is kind of sad, but the problem is that there really isn't a huge market for OCR in the way that there is for web browsers and other more successful projects, coupled with the inherent difficulty in doing good OCR.

CAPTCHA Breakers (3, Interesting)

MoonBuggy (611105) | more than 3 years ago | (#35970038)

If the increasing absurdity of the CAPTCHAs I tend to see is anything to go by, there are programs out there that'll read normal printed text from even the crappiest photo without missing a beat. The question is, are the spammers using standard commercial solutions, or have they got some useful tech of their own that we might be able to get our hands on (seize it as part of a settlement and make it public domain, for instance).

Re:CAPTCHA Breakers (3, Insightful)

jewelises (739285) | more than 3 years ago | (#35970116)

I don't think that spammers have any amazing tech, they just have different requirements. They can still send spam with a 1% success rate whereas with OCR you'd want a 99% success rate.

99% success rate is crappy ... (3, Insightful)

perpenso (1613749) | more than 3 years ago | (#35970312)

I don't think that spammers have any amazing tech, they just have different requirements. They can still send spam with a 1% success rate whereas with OCR you'd want a 99% success rate.

I once worked on an OCR project. The client specified a 99% success rate and we strained to restrain our grins. 99% is about one error every one or two lines of text. We got 99.6% in our first implementation before we even began to work on accuracy. Admittedly we had excellent image quality. This was a custom solution that had its own optics.

Re:99% success rate is crappy ... (3, Interesting)

martin-boundary (547041) | more than 3 years ago | (#35970850)

Heh, it's always fun to reinterpret requirements to make them easier to implement :)

A 99% success rate could also mean 99 pages with zero errors out of a 100 pages attempted. With 250 words per page that would represent a mandated success rate of 99.995%

Re:99% success rate is crappy ... (1)

thegarbz (1787294) | more than 3 years ago | (#35971124)

QUICK A LAWYER, LET'S GET HIM!

As an aside. Stupid slashdot filter is telling me using caps is like yelling. Well I AM yelling.

Re:99% success rate is crappy ... (1)

perpenso (1613749) | more than 3 years ago | (#35971394)

Heh, it's always fun to reinterpret requirements to make them easier to implement :)

A 99% success rate could also mean 99 pages with zero errors out of a 100 pages attempted. With 250 words per page that would represent a mandated success rate of 99.995%

Thankfully the client specified 99% with respect to character recognition not correct pages. If they were specifying pages we would have been straining to suppress pissing our pants rather than suppressing grins. :-)

Re:99% success rate is crappy ... (1)

tompaulco (629533) | more than 3 years ago | (#35974790)

Heh, it's always fun to reinterpret requirements to make them easier to implement That's what out customer's do to us. We promised 95% accuracy rate on OCR per CHARACTER, but they generate their numbers off of how many fields of data had a wrong character in them.
Of course, we also specified that based on clean images scanned at 300 DPI, and they give us crap images scanned at 200 DPI with fold lines , highlighter and pen scribble and apparently their mailing machine sprays some kind of serial number on every single page that runs right over what we need to read.

Re:99% success rate is crappy ... (1)

SpinningCone (1278698) | more than 3 years ago | (#35974872)

obligatory XKCD [xkcd.com] (alt text is relevant)

Re:99% success rate is crappy ... (1)

AmiMoJo (196126) | more than 3 years ago | (#36021858)

Google's approach to accuracy appears to be somewhat novel. Most OCR software uses spelling correction and grammar rules to improve accuracy but Google use data derived from the contents of pages they index. They use it for translation too which, when it works, gives their output a more natural quality compared to previous efforts. I find that Chinese to English works particularly well.

Doubling OCR accuracy is exponentially harder. Unlike a human that can easily pick up on what type of document it is (letter, technical manual, novel, newspaper article) and make informed mental corrections based on its expectation of the language used machines have to come at documents more or less blind. Even document structure can be hard to figure out, e.g. the way a story flows over multiple columns in a newspaper.

Re:99% success rate is crappy ... (1)

perpenso (1613749) | more than 3 years ago | (#36022056)

... Google use data derived from the contents of pages they index ...

Interesting. I guess that adapts for common usage deviating from proper spelling and grammar.

... pick up on what type of document it is (letter, technical manual, novel, newspaper article) and make informed mental corrections ...

Machines will do this to a degree, for example favoring lowercase L when the surrounding characters are alphabetic and favoring one when the surrounding characters are numeric. But yeah, context rules, the preceding works well enough in prose but often fails in source code.

Re:CAPTCHA Breakers (1)

Hal_Porter (817932) | more than 3 years ago | (#35971384)

Don't tell him this. It's funnier to let him keep PH3AR1NG TEH 3L33T HAXORZ.

Re:No good free solutions (0)

Anonymous Coward | more than 3 years ago | (#35970114)

We use multiple free software OCR solutions as part of our spam filtering... works good for us.

No good solutions anywhere (1)

dbIII (701233) | more than 3 years ago | (#35972586)

People expect OCR to be magic so are always disappointed when they first run the stuff. They do not understand that one or two uncertainties per page is a pretty spectacularly good result until you've been able to train the thing with identically laid out documents on the same paper etc for a long time. Feeding in stuff printed on a dot matrix makes the secretaries cry ten minutes after they greeted the arrival of the OCR software with joy. Of course it works a bit better on later pages after tweaking or training - but it all looks like crap to start with.
With specific jobs the stuff can work out of the box with hardly any errors but you have to be lucky.

Re:No good solutions anywhere (1)

tompaulco (629533) | more than 3 years ago | (#35974888)

Well, that is where the commercial software has open source beat. They have already trained their OCR on millions of characters. But then, there is no retraining most of them, other than upgrading to the next version when it comes out. Tesseract you can train, but it starts out pretty crappy. Whether Tesseract is of any use to you depends on what your needs are. If you are going to be OCRing something that has a fairly narrow range of image quality and font, then you can train Tesseract to pick it up very specifically and it will probably outperform commercial vendors. If, on the other hand, you need to pick up OCR off of any old crap that someone ran through a scanner, than you will probably immediately see decent results out of the commercial package, and no amount of training in Tesseract will ever improve it much.

Nexus S has no flash? (1)

versificator (2031720) | more than 3 years ago | (#35969868)

according to the article, it doesn't have a flash. which is completely incorrect. I thought maybe the Docs application doesn't use the flash when taking pictures, but again...this is incorrect.

Re:Nexus S has no flash? (1)

icebike (68054) | more than 3 years ago | (#35969950)

Google DOCs will use the flash or not, based on user settings, so, yeah, he just missed that.

But In my tests with Nexus One, (Not Nexus S), using the flash at the range needed to see the picture just puts a
white blob in the center of the shot and is actually worse than using bright room lights.

Re:Nexus S has no flash? (2)

Idbar (1034346) | more than 3 years ago | (#35969968)

What article? The link seems to be pointing to a 403 Error page. At least to me.

Re:Nexus S has no flash? (1)

ThatsMyNick (2004126) | more than 3 years ago | (#35970356)

Google Cache of TFA [googleusercontent.com]

Re:Nexus S has no flash? (1)

N Monkey (313423) | more than 3 years ago | (#35972244)

What article? The link seems to be pointing to a 403 Error page. At least to me.

Maybe it was just the OCR'ed output of a scan of "Loser roar"

( Ok, I couldn't come up with anything better)

OCR-B character recognition (question) (1)

owlstead (636356) | more than 3 years ago | (#35969916)

I'm in the market for a good way of recognizing OCR-B based characters on an android device (mostly uppercase characters and digits). I know the location (on a flat 2D plane in a 3D space) of the characters, but they do not form sentences or even words. Does anyone have a good algorithm to do this kind of low-level character recognition? A library would be even better of course, especially if it is open source. I'm personally thinking of comparing bitmaps or vectors.

As a hint to other devs, many commercial barcode packages contain OCR character recognition, which could be used for purposes where you can specify the conditions (fonts, lighting conditions etc).

Re:OCR-B character recognition (question) (1)

coredog64 (1001648) | more than 3 years ago | (#35970300)

What about OpenCV?

http://blog.damiles.com/?p=292 [damiles.com]

Re:OCR-B character recognition (question) (1)

owlstead (636356) | more than 3 years ago | (#35970474)

Looks promising, many thanks! License plates are not that far off from the intended purpose.

Um... (4, Insightful)

Shadow Wrought (586631) | more than 3 years ago | (#35969952)

He uploaded the 120 dpi image instead of the 300 dpi image and is surprised the OCR sucks. Really? Lossy isn't the concern when you're OCR'ing bloack text on a white background. Seriously. Think about what the image is actually going to be used for, then make your decision.

And, seriously, how effective of OCR'ing are you really imagining you're going to get off of a camera phone pic, anyway?

Re:Um... (2)

ortholattice (175065) | more than 3 years ago | (#35970250)

It seems TFA is giving 403 errors, but Google's 300 DPI PDFs that you can download for public domain books often have incredibly poor quality, much poorer than you get with 300 DPI on a cheap home scanner. While they might be marginally acceptable for novels, for the old math books I'm interested in, the Google PDFs are mostly useless. Often you can't disambiguate small blurry subscripts by eye, never mind OCR. On the other hand, I have never had a problem reading 300DPI subscripts on scans I make at home, and they usually will OCR fine. too, unless they are tiny subscripts of subscripts. I wrote about this here. [slashdot.org]

Re:Um... (1)

sootman (158191) | more than 3 years ago | (#35975216)

> And, seriously, how effective of OCR'ing are you really imagining
> you're going to get off of a camera phone pic, anyway?

Camera phones are getting quite good. An iPhone 4 takes 5MP images and there are many others out now that are as good or better.

Specifically, the images are 2592x1936 pixels which equates to 225 dpi at 8.5" x 11". That's plenty to OCR a typical page--say, 8.5x11 with clean 12-point type. I've carefully taken photos of documents with my phone and printed them and they're indistinguishable from a photocopy.

What a dumb fuck (1)

binford2k (142561) | more than 3 years ago | (#35969998)

I suppose this retard thinks he's clever.

Bad Kitty!

Verily, you may not link directly to images. Link to their containing web page instead.

You tried to access: /blog/

From: http://hurvitz.org/ [hurvitz.org]

I have spoken!

They Why (3, Informative)

RileyCR (672169) | more than 3 years ago | (#35970054)

Google took the Tesseract OCR engine, one of the first engines, and wrapped document analysis and some high level improvements on it. In the current OCR market landscape there are only 4 commercial engines, and two that make up 98% of the market. Compared to those two OCROpus is not even close because of the legacy engine. So the real reason is it's old technology, very old. Unless Google licenses ABBYY or Nuance they will not get any better. The reality is OCR takes 50 man-years to develop to compete with these top two engines, and it's just not practical for even Google to go out and start from scratch.

Re:They Why (1)

camperslo (704715) | more than 3 years ago | (#35970124)

Does that mean it couldn't be a viable candidate for some Summer of Code work then?

More like Masters/PhD Thesis than Summer of Code (3, Interesting)

perpenso (1613749) | more than 3 years ago | (#35970370)

Does that mean it couldn't be a viable candidate for some Summer of Code work then?

More like a bunch of masters/phd thesis to get started.

OCR is an area of AI research under the topic of Computer Vision. It is yet another area that seems simple in concept but turns out to be incredibly difficult in practice.

Re:More like Masters/PhD Thesis than Summer of Cod (2)

Lehk228 (705449) | more than 3 years ago | (#35970452)

seems to me that OCR would be an area that would be easy to build a framework for genetic algorithms, using a huge collection of solved OCR pages to evaluate. with each generation being tested on a random subset of pages so they do not learn to cheat instead of learn to solve.

only problem is sometimes GA make a solution that makes no sense and should not work but somehow does http://www.damninteresting.com/on-the-origin-of-circuits [damninteresting.com]

Re:More like Masters/PhD Thesis than Summer of Cod (1)

perpenso (1613749) | more than 3 years ago | (#35970790)

seems to me that OCR would be an area that would be easy to build a framework for genetic algorithms, using a huge collection of solved OCR pages to evaluate. with each generation being tested on a random subset of pages so they do not learn to cheat instead of learn to solve.

Sounds like a great thesis project. :-)

Re:More like Masters/PhD Thesis than Summer of Cod (1)

koxkoxkox (879667) | more than 3 years ago | (#35971536)

Genetic algorithms are an optimisation algorithm, but what do you want to optimise exactly ? What are your individuals here ?

The idea of using a large collection of solved problem to check and improve the accuracy of the method looks more like neural network to me. Indeed, this seems to be a common method for OCR. For example : http://www.codeproject.com/KB/dotnet/simple_ocr.aspx [codeproject.com]

Re:More like Masters/PhD Thesis than Summer of Cod (1)

Tacvek (948259) | more than 3 years ago | (#35971902)

While neural networks are a good solution, genetic algorithms can still be used in conjunction with them.

One possible training method for neural networks happens to be genetic algorithms. The genes being the link strengths, and the fitness function being say the percentage of correct results. (If you reach a sufficiently high level, you might want to change to minimizing uncertainty, with a fitness dropping exponentially if the correct percentage drops too low.)

In the alternative genetic algorithms can be used with other neural network training techniques, with the genetic algorithm selecting the number and arrangement of nodes, with fitness being related to the quality of the network after training.

A hybrid of both the above can also be used. I believe that is the approach critterding (an artificial life simulator) uses for the neural networks representing the creature's brains which are evolved much like the rest of the creatures body.

Re:More like Masters/PhD Thesis than Summer of Cod (1)

allo (1728082) | more than 3 years ago | (#35973562)

hm, no genetic programming, train a neural net with input / correct output for single letters and then for whole words and see what you can get there.

Re:They Why (0)

Anonymous Coward | more than 3 years ago | (#35970204)

You're saying Google couldn't have 50 persons working for a year on a critical component of their Google Books efforts that also has relevance to web search and Google Docs?
I can see why it wouldn't be cost effective compared to licensing an existing engine, but it's hardly infeasible.

Re:They Why (1)

spinkham (56603) | more than 3 years ago | (#35975232)

Sure, and with 9 women you can make a baby in a month.

10 experts and 5 years would be more feasible. 5 experts and 10 years even more so.
Scaling is hard.

See also http://en.wikipedia.org/wiki/The_Mythical_Man-Month [wikipedia.org]

Re:They Why (1)

Super Dave Osbourne (688888) | more than 3 years ago | (#35970784)

Until the day you can hold up a document in front of your iBhone camera and have it snap and convert that document with 99%+ accuracy and have spell and grammatical checking solve the other 1% accurately to 99% also, meaning 99.99% conversion is done properly in any language, the technology won't be tolerated by end users. That will take more as you say than Tesseract, as you so well pointed out. Google should stop whoring themselves as OpenSource focused and just do the right thing by purchasing outright and pushing to the open market the tech that exists. Then others will come in and make the move to do better, and the model of improved software continues.

Re:They Why (1)

Exceptica (2022320) | more than 3 years ago | (#35972504)

You haven't done the math. 1% is not nearly enough. This reply is 300 chars long and getting 3 wrong is annoying enough. Error rate should go down to .000001% for OCR to be a commodity, and that's with good 600dpi originals. Factor in crappy scans, poor resolution/contrast and you are in for a pretty tought ride.

Re:They Why (1)

Vegemeister (1259976) | more than 3 years ago | (#35972986)

Just put the OCR'd text in a side channel with the image, as PDF does. Then you get a searchable, copyable document, andd preserve the original formatting and avoid the need for extremely low error rate.

Re:They Why (2)

afidel (530433) | more than 3 years ago | (#35971688)

Hmm, of the four engines we use you mentioned two. Abbyy has by far the worst recognition rate (but is most flexible for scan setup so we use it for arbitrary documents rather than the forms based stuff going into our document management system). We also use Nuance through Adlib. The other two we use are Kofax AIP, and DokuStar.

Re:They Why (1)

tompaulco (629533) | more than 3 years ago | (#35975566)

My shop uses Nuance through two different products, and we are looking into directly interfacing with Abbyy. The results we have seen from Abbyy have been much better than what we have seen through Nuance. I guess mileage varies.

403 (1)

Nick Ives (317) | more than 3 years ago | (#35970072)

Did anyone else mirror this? I'm just getting a 403.

Re:403 (1)

master_kaos (1027308) | more than 3 years ago | (#35970098)

same...

Re:403 (0)

Anonymous Coward | more than 3 years ago | (#35970148)

Same here, seems like it doesn't even try to connect to the page.

Way to go (0)

rudy_wayne (414635) | more than 3 years ago | (#35970112)

Forbidden

You don't have permission to access /blog/2011/04/ocr-quality-of-google-docs on this server.
Additionally, a 404 Not Found error was encountered while trying to use an ErrorDocument to handle the request.

Nice link, asshole.

Oh WTG (1)

Anonymous Coward | more than 3 years ago | (#35970146)

Self-promote to /. and host on a box that can't handle the limited traffic of a 25-comment popularity story?

GOOD WORK SON

Due to an "intentional decision" (0)

Anonymous Coward | more than 3 years ago | (#35970258)

Much better than an accidental decision, I guess.

Use in combination with CamScanner (1)

Clifton Beach (809210) | more than 3 years ago | (#35970342)

You can get better results by using CamScanner [appbrain.com] to capture the image, then upload the JPG to Google Docs. I found that uploading the JPG works better than uploading the PDF.

403 Forbidden (0)

Anonymous Coward | more than 3 years ago | (#35970378)

I got the "403 Forbidden" message !

Crippleware? (1)

mr100percent (57156) | more than 3 years ago | (#35970808)

Google prides itself on having supposedly the best quality apps and features, which is why they take years to leave Beta. Why would they intentionally release a crippled version of their app? That will be the worst thing since Google Books with the missing pages.

Re:Crippleware? (0)

Anonymous Coward | more than 3 years ago | (#35973130)

Google has only one idea - put stuff into their search engine to make money from advertisers. Everything else is padding.

Re:Crippleware? (0)

Anonymous Coward | more than 3 years ago | (#35976034)

That's not the case at all. Google usually do a passable release first. If it's a core product they will improve on it. If it's not, it's just a place holder to let others pick it up. I've never hear google claiming they have the best quality app. The reason for lots of beta is more likely for when people complain, they can say it's beta.

Re:Crippleware? (0)

Anonymous Coward | more than 3 years ago | (#35992356)

Its not crippleware, just go RTFA and apply some logic if you want to know why.

Slashdotted (0)

Anonymous Coward | more than 3 years ago | (#35970810)

Slashdotted

In case I'm not the only one. (1)

scumfuker (882056) | more than 3 years ago | (#35971202)

Wikipedia says:

Optical character recognition, usually abbreviated to OCR, is the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine-encoded text. It is widely used to convert books and documents into electronic files, to computerize a record-keeping system in an office, or to publish the text on a website. OCR makes it possible to edit the text, search for a word or phrase, store it more compactly, display or print a copy free of scanning artifacts, and apply techniques such as machine translation, text-to-speech and text mining to it. OCR is a field of research in pattern recognition, artificial intelligence and computer vision.

http://en.wikipedia.org/wiki/Optical_character_recognition [wikipedia.org]

Obligatory (1)

katsuo11 (1791512) | more than 3 years ago | (#35972660)

"You're holding it wrong."

I've just tried it... (2)

Simon Brooke (45012) | more than 3 years ago | (#35973092)

I think the quality is tolerable. I photographed a document lying on my desk, without doing anything special to make it smooth or adjust lighting. This is a good simulation of a real-world situation where you can photograph a piece of text. There were errors in the transcription but it was readable, and with a very little editing would have been perfect. What surprised me was that apparently the whole image was uploaded from my phone to Google Docs, and then downloaded again, which is a little bit inefficient; I think that the OCR process runs server side.

I see this as very useful. This afternoon I'm going in to the local planning office to look at some planning applications; I won't be able to take them away, and I doubt I'll be allowed to use a photocopier, but I will have my phone. That's a real world application. I can think of hundreds more.

Re:I've just tried it... (0)

Anonymous Coward | more than 3 years ago | (#35975308)

What surprised me was that apparently the whole image was uploaded from my phone to Google Docs, and then downloaded again, which is a little bit inefficient; I think that the OCR process runs server side.

Yes. This is how most of the Google services work. The voice search on Android uploads the sound it captures to Google, and then downloads the "translation". Remember, Google's goal is to be endpoint/OS agnostic. Easier to port such a system to different platforms.

Typical Google (1)

ChrisMaple (607946) | more than 3 years ago | (#35977880)

I've tried a couple of the free applications that Google has made available, and they've been really inferior products. It's no surprise that they've put out yet another amateurish effort.

Google Pushing The Edge (0)

Anonymous Coward | more than 3 years ago | (#35993404)

People jump on Google, apparently the iPhone toadies who need to diss the opposition, but Google has been pushing the edge of what good services they can provide for users in return for their consumer behavior. I know that almost anything I type can be tracked by Google, but their Gmail, their Search, their innovation had provided the very essence of the new model of giving something to the ordinary person in return for their American consumer behavior. What have the TV networks ever given you in return for their insipid, insulting, and intellectually degrading commercial breaks.? At least Google gives me good, efficient, speedy, reliable, and stable email and file storing service with ads I sometimes look at, and sometimes ignore. I'm sure the OCR will be refined, nobody else has the *&*() to even try such an advancement by providing a new, useful service in return for one's marketing behavior. Nothings free, and have never had a problem with Google's info on me. Hope they kick MS and Oracle and iPhone butt...

Check for New Comments
Slashdot Login

Need an Account?

Forgot your password?

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>