Large-Scale Paper-To-Digital Conversion?

timothy posted more than 9 years ago | from the that's-asking-a-lot-professor dept.

Education 459

An anonymous reader writes "I've just been asked to digitize several dozen sets of lecture outlines at the university where I work. Basically, professors want to hand me a big (often 100+ page) stack of their handwritten lecture notes (with messy text, equations, and diagrams; sometimes double-sided) and expect me to post a PDF-or-something-similar to their course's web page. However, every desktop scanner I've ever used takes 1-2 minutes of user-attention per page and the resulting files end up Huge, impossible-to-read, or both. All I have at my disposal is my PowerBook, Acrobat, a couple hundred dollars of department funds for a new scanner (this maybe?), and, if I ask nicely, overnight use of the secretary's Win2k box. Any ideas? Sheet-fed scanner recommendations? Better file formats than PDF (or better PDF settings)? Do any of you students have usability advice?"

Get stuffed (4, Insightful)

October_30th (531777) | more than 9 years ago | (#9231472)

Uh. How about telling your prof. to get stuffed and get a real secretary.

Re:Get stuffed (5, Interesting)

Amiga Lover (708890) | more than 9 years ago | (#9231552)

I think you're right on the money. May be well worth taking the job to an outside agency. There are many print shops using Xerox Docutechs, which scan in many hundreds of sheets at once to print copies of documents. The scanning takes barely a second a page, and it wouldn't surprise me if the document format being stored inside the docutech is something that can be used for this purpose.

I've had a similar job, where our school's lecturers wanted their notes in the same style so one of my jobs as admin assistant was retyping chapters from textbooks & inserting the original illustrations. That didn't start out too bad until lecturers started basing course notes on entire quarters of books, expecting them to be retyped completely in their own style. Give an inch they'll try to take a mile - use the few hundred $$ to get it professionally scanned.

Re:Get stuffed (2, Interesting)

SoSueMe (263478) | more than 9 years ago | (#9231609)

...retyping chapters from textbooks & inserting the original illustrations. That didn't start out too bad until lecturers started basing course notes on entire quarters of books...

Isn't that copyright infringement?
Unless, of course, they wrote the textbooks.

Re:Get stuffed (0)

Anonymous Coward | more than 9 years ago | (#9231559)

WOW! Thats *so* helpful! Just refuse to do the job your employer is paying you to do... DAMN... why didn't I think of that?

Re:Get stuffed (3, Insightful)

October_30th (531777) | more than 9 years ago | (#9231625)

WOW! Thats *so* helpful! Just refuse to do the job your employer is paying you to do... DAMN... why didn't I think of that?

How do you know he's getting paid to do it? Some professors have a nasty habit of getting all their nasty, menial and boring stuff done by their students who are already working on their degree projects 12 hours a day, six days a week.

Ok, so for some reason I assumed that the poster is a student so my initial reaction was probably off. I would never assign such a menial, dead-end task to my postgrad students, nor would I have accepted such a task without objections when I was still a student.

Re:Get stuffed (5, Insightful)

djplurvert (737910) | more than 9 years ago | (#9231694)

In addition to the points already made it is not unreasonable to simply tell the prof that his/her expectations are unreasonable. Perhaps "get stuffed" is a bit over the top but I've found that employers (even professors) will listen to reasonable explanations.

I used to have a boss that would say things like "this should only take you about five minutes". I finally told him, "nothing takes just five minutes, if I have to stop what I'm doing there is a startup/teardown cost for every task." I convinced him that there was a granularity of 1/2 hour for every random task he wanted done. The discussion was fruitful for both of us, he was more reasonable about his expectations and put a bit more thought into what he wanted to distract me from my primary task to do.

Now, the original idea is a reasonable proposition, however, it isn't really the sort of thing that should be done for just one prof. Perhaps several departments can combine their resources to setup something that will allow this type of thing to done in a reasonable time frame.


Re:Get stuffed (3, Insightful)

Walt Dismal (534799) | more than 9 years ago | (#9231632)

No, seriously, this request shows utter lack of concern by someone who may be a professor, but is also a bad manager and possibly an idiot. Your response perhaps should be to scope out the project and toss estimate and the funding issue back into his lap. But do not let yourself be used as slave labor.

Bring down emissions (1, Offtopic)

skidrowe (688747) | more than 9 years ago | (#9231474)

Hopefully this will help reduce bad emissions from the production of paper...I've always heard they use some nasty chemicals...

Re:Bring down emissions (-1, Troll)

Anonymous Coward | more than 9 years ago | (#9231628)

go hug some trees, you god damn hippie!

Kinkos? (5, Informative)

axonal (732578) | more than 9 years ago | (#9231489)

Some Kinkos have those big goliath Xerox scanners which act just like copiers. Load a stack up papers, and it will scan the pages and load them up. Not sure about PDF export/etc though.

Re:Kinkos? (1)

anglete (782289) | more than 9 years ago | (#9231548)

I agree, it will be cheaper for kinkos to do it with their machine than you to do it and waste a couple days figuring it out.

Re:Kinkos? (5, Informative)

zenquest (315406) | more than 9 years ago | (#9231572)

We have a Xerox WorkCentre Pro 65 at my school. It can scan at around 50-60 pages per minute, and will do double-sided. It will do PDF output, too. (and email it or FTP it to you, if so configured)

Our teachers use them for exactly the purpose described. If you don't have one of these type machines around anywhere, then definitely give Kinkos or some similar establishment a try.

Re:Kinkos? (1, Informative)

Anonymous Coward | more than 9 years ago | (#9231590)

The current price for Scan to PDF at my Kinko's branch is a $25 setup and $.25 per page. Since there are hundreds of pages, you can probably get them to waive the setup, since it's really just there to gouge folks who want a couple pages scanned.

Knee to the grindstone... (1, Funny)

Faust7 (314817) | more than 9 years ago | (#9231490)

Flex your fingers, crack your knuckles, and get some eyedrops... because you're going to be doing a lot of typing.

Re:Knee to the grindstone... (4, Insightful)

Exocet (3998) | more than 9 years ago | (#9231560)

"Ummm yeahhhh... if you could just do that..."

Faust7 is right about this one. Frankly, OCR is ok, but not great - on nice text on book-or-better paper. Handwritten notes? With equations? No. Not unless your profs have some damn fine handwriting and we all know that that is absolutely not the case.

My advice is the same as Faust7's with these additions: spend some of that money on a really nice keyboard, wrist-rest and/or maybe a nice monitor. You are going to be needing all three. If there are any left over funds, get some really nice tea. I suggest Twinnings English Breakfast or Prince of Wales, if you're going to go bagged.

Re:Knee to the grindstone... (1)

comm3c (670264) | more than 9 years ago | (#9231574)

or instead of buying a scanner, buy some people to type it with you and have fun.

well... (5, Funny)

Anonymous Coward | more than 9 years ago | (#9231498)

if I ask nicely, overnight use of the secretary's Win2k box

Plus, if you're lucky, you could also get other after-hours favors from the secretary as well ;-)

Re:well... (1, Funny)

Anonymous Coward | more than 9 years ago | (#9231615)

Plus, if you're lucky, you could also get other after-hours favors from the secretary as well

No no no... he asked only for the secretary's Win2k box. A mistake, if you ask me.

Eww. (0)

Anonymous Coward | more than 9 years ago | (#9231673)

Chances are she's a plump, old, matronly, bespectacled hausfrau.

High Speed Scanner (2, Informative)

Anonymous Coward | more than 9 years ago | (#9231501)

You need a high speed scanner. Fujistu makes a nice one that works pretty well.

Re:High Speed Scanner (1)

Judg3 (88435) | more than 9 years ago | (#9231642)

Indeed, the AC is right - Fujitsu (and a few others) is the way to go. Back when I used to work for a stock brokerage, all of the overwhlming amount of paper that customers had to fill out would be scanned in with a few high-speed Fujitsu's into Hyland's OnBase [hyland.com] document management system.

Sadly, this approach is way out of league for the small budget the poster has.
I'd have to wonder if a consumer scanner, even a nice one like that HP, can keep up with the constant use required of it.
Much like Laser printers, the Fujitsu scanners have complete rebuild kits that you can use to bring them back to like-new state, which I don't think the consumer based HP scanners would have. But then again, if you get a good year or so out of a $300 scanner before needing a new one, that's a lot better then buying a high speed scanner (They easily run $3000 used)

Simple. (5, Funny)

jebell (567579) | more than 9 years ago | (#9231504)

Outsource the job to India.

Gotta be careful though. (5, Funny)

Faust7 (314817) | more than 9 years ago | (#9231630)

Outsource the job to India

"No, no, not my entire job, just this one part. No, I can do the rest. No, really. No! No... please..."

Re:Simple. (4, Insightful)

GothChip (123005) | more than 9 years ago | (#9231685)

I know the parent post was funny but he's thinking along the right ideas.

Take the few hundred you have to spend on equipment and spend it hiring a few temps.

A good typist should be able to type up hand written notes faster than scanning them all in and manually fixing all the mistakes.

Re:Simple. (2, Insightful)

pendragn (107545) | more than 9 years ago | (#9231704)

Outsource the job to India.

Not as bad an idea as it sounds. My advice is to not waste the department's money, and your time, buying, installing, and using a sheet feed scanner. Somebody in your local area assuredly has one already that they either rent out to people in your situation, or that they use to do the work you need done.

Use the funds that the department gave you to have your local copy shop do the work. They will almost certainly do it faster than you could, and the end product will most certainly be better than what you could provide. This is the kind of thing that the people who work at copy shops do for a living.

Also PDF is a great format for this, highly portable, and so far fairly version proof. You don't have to worry about the PDF being obsolete before the professor decides to change the structure of his class.

HP Copiers (2, Informative)

kevin@ank.com (87560) | more than 9 years ago | (#9231506)

The large multi-function HP Printer/Copiers will scan and e-mail a PDF of an entire stack of papers just as you would use a normal copier. I'm sure that the other manufacturers have similar features, but it is the HP equipment that we use at work.

Re:HP Copiers (2, Insightful)

XaXXon (202882) | more than 9 years ago | (#9231566)

Will you please tell both of us where we can get one for a few hundred dollars, as specified in the question?

I think the real answer is that this guy is S.O.L. .. he's just going to have to spend some good quality time getting to know a consumer-level scanner, and let the professor know to do his notes in software initially.

Re:HP Copiers (1)

plankers (27660) | more than 9 years ago | (#9231646)

You might not be able to get a copier that does this for a couple hundred bucks, but if a place on campus has a copier you can use, either for free or cheap (since scanning doesn't use toner or paper, after all), you win.

Re:HP Copiers (2, Informative)

kevin@ank.com (87560) | more than 9 years ago | (#9231671)

The big copiers run a couple of thousand dollars, but the multi-function fax/scanner/printers from HP are in the approximate price range and are all able to scan stacks of paper rather than individual sheets. The easiest way to get one of the large printers for less that a few hundred dollars is to start calling alumni who work for HP and ask them if they'll make an equipment donation.

Re:HP Copiers (3, Informative)

plankers (27660) | more than 9 years ago | (#9231586)

The Konica ones where I work do a similar thing -- they can email you a TIFF or a PDF of a huge stack of paper. Ours are only black & white, and will only do a fixed resolution, but a newer color copier would fix all those shortcomings. Many universities and colleges have print centers that have this type of equipment if your department doesn't.

Worse case, you can get an HP scanner and the automatic document feeder for it. If this is going to happen a lot it should be pretty easy to justify the $500 or so for the scanner, ADF, and a copy of Acrobat.

I'd go for the (0, Funny)

Anonymous Coward | more than 9 years ago | (#9231509)

overnight use of the secretary's box ...

Hello (-1, Offtopic)

Anonymous Coward | more than 9 years ago | (#9231511)

How does this relate to Michael Moore? I'm just a little confused.

HP Digital Sender (4, Informative)

Guanix (16477) | more than 9 years ago | (#9231512)

The HP Digital Sender [hp.com] series are really great for this stuff. You feed it a stack of paper and it scans it, 15 pages per minute, and can store the PDF on a file server or you can send an email with the PDF attached directly from the network sender! It's a bit expensive, but try to look around for one, maybe the local copyshop? Guan

Re:HP Digital Sender (1)

HBI (604924) | more than 9 years ago | (#9231580)

There used to be a smaller model for about $1500, the 8100C. We have one of these and it's quite useful.

Not as fast as they claim though. Take the speed with a grain of salt, assume half.

Re:HP Digital Sender (3, Informative)

W2k (540424) | more than 9 years ago | (#9231608)

Great product. Unfortunately, its price is listed at about 10x the "few hundred dollars" the original submitter specified in his posting.

I've found the Canon Canoscan flatbeds do a good job of automatically scanning straight to PDF, only minimal user intervention (hit "enter") is required. There's a special mode for scanning text which enhances contrast, so messy notes and diagrams should be fine, too. The resulting PDF:s are also remarkably small in size for what is essentially a huge bitmap. I've a Canon Canoscan 8000F myself, it's very fast and can do higher DPI's than most people need, and although it might be a bit out of his price range, I'm sure the cheaper models can do the same job nearly as well.

Re:HP Digital Sender (1)

Florian Weimer (88405) | more than 9 years ago | (#9231659)

Most digital copiers can do similar things nowadays. Typically, you rent such machines, and it's not too expensive in this case, especially if the device is also used as a true copier.

Format (2, Interesting)

bobthemuse (574400) | more than 9 years ago | (#9231513)

While PDFs are pretty well supported, you'll still be storing it as raster data, so there won't be any size decrease over using an image format, such as PNG.

Are there any web-based packages for searching documents, based on OCR-extracted keywords? Obviously with messy hand-written notes, formulas, etc, OCR won't work reliably. For a similar project, I'd like to OCR the files and use the text data solely for keyword searching. Obviously not perfect, but better than just images.

PNG is your friend....

Ounce of prevention... (1)

drsmack1 (698392) | more than 9 years ago | (#9231514)

This whole problem could be eliminated if these papers were put into PDF as soon as they are created. That said; I would explore solutions from the legal profession - they have a lot of things that do this.

If you're being 'asked' (4, Insightful)

Space cowboy (13680) | more than 9 years ago | (#9231517)

Just say 'No'. (If you're being told, it's a different matter, of course).

It sounds to me like a damned hard job to automate (which is the only way it's not going to be a constant drain on your time), and you're being given next-to-no resources to even come up with a creative solution. Sometimes the best answer is in fact 'No' - it forces people to re-evaluate what they're asking. It comes with the danger of being sacked if it's you that's being unreasonable, of course....


Re:If you're being 'asked' (4, Insightful)

malia8888 (646496) | more than 9 years ago | (#9231681)

I really agree with Space cowboy. My former husband was a college professor. He was very brilliant in his field, but anything out side of his narrow realm daunted him. He wanted to put pennies in our fusebox when the lights went out. He stared at a breaker box in the condo like it was the control panel of an alien spacecraft.

Explain the enormity of this scratched note-to-finished Pdf to this educator. Use crayons, mirrors, yarn and tape if necessary to get your point across. Just be diplomatic :P

The most important thing (5, Funny)

Timesprout (579035) | more than 9 years ago | (#9231518)

Is to first make an exact copy (by hand) of all the existing documents. Its vital to have a full backup in case anything goes wrong with the scanning process you can always restore the manilla folders to their original filled state.

HP Digital Sender (1)

wik (10258) | more than 9 years ago | (#9231519)

See if the department can afford an HP Digital Sender. While they're quite pricy, they'll feed, scan, and email you a PDF.

http://h10010.www1.hp.com/wwpc/us/en/sm/WF05a/15 17 9-64175-64404-12126-64404-25324.html

ADF Scanners (5, Informative)

Loiosh-de-Taltos (247549) | more than 9 years ago | (#9231521)

What I suggest and use is the HP 4C scanner. It's a SCSI-II only scanner that can be found on Ebay for under $10 usually. They also have an automatic document feeder option that can be found on Ebay. This scanner was originally designed for both Windows and Apple compatibility as well. It cannot handle 2-sided sheets.

The scanner has four different pieces of software you can choose to use, I'd suggest Precision Scan Pro as that makes multi-document scanning easier.

Latax is your friend. (0)

Anonymous Coward | more than 9 years ago | (#9231522)

Just hand back half of the stack, then do the half you kept up in latex.

Change your major? (0)

Anonymous Coward | more than 9 years ago | (#9231527)

Change your major?

Hey, it's a thought.

Long time scanning per page (1)

dicepackage (526497) | more than 9 years ago | (#9231529)

I had to do something similar with about a thousand or so pages except they were all seperate files. I would concentrate on doing everything one step at a time. What I mean by that is scan all the pages into your computer and then begain making them into PDF files or whatever format you prefer. On my scanner it took about a minute per page so my main problem was just not having anything to do durring the time while it was scanning. Don't worry about this use this time to do something else such as reading a book or have another computer next to you to surf the web or play games on.

vi, LaTeX, 10 coffeepots and reduced sleep (0)

Anonymous Coward | more than 9 years ago | (#9231537)

I am digitalizing my lecture notes with LaTeX. Takes some time, but results in perfect output quality and small file sizes. Needless to say I am not using any wimpy wysiwyg-stuff to produce the graphics, thats what the picture-environment was made for.

HP Scanjet 5550c is not what you want (4, Informative)

GraZZ (9716) | more than 9 years ago | (#9231538)

Definately keep clear of the Scanjet 5550c; there's a reason why it's the cheapest feed scanner out there. It will frequently jam if you a) load more than 5 sheets into the feeder or b) use any sort of paper that has been handled by human beings.

Our Engineering Society was trying to put up an exam archive with one of them and quickly gave up and started scanning with the flatbed.

Also the scanner has no sane support (one of the few HP scanners that doesn't)

DjVu (3, Informative)

alienw (585907) | more than 9 years ago | (#9231541)

Acrobat sucks ass for bitmap images. It doesn't display them very well, they don't print out well, and the files are huge. DjVu [djvuzone.org] is a new image format that compresses extremely well (a few kilobytes a page -- actually comparable to ASCII text). It's somewhat proprietary, but it's probably the best solution here. There are free web-based services that can compress your images. You can try some of them and see for yourself.

Re:DjVu (4, Informative)

Ed Avis (5917) | more than 9 years ago | (#9231576)

For scanned documents, tic98 [waikato.ac.nz] compresses even better than DjVu. It's free software and you can even read the author's PhD thesis about it.

Re:DjVu (2, Informative)

mystik (38627) | more than 9 years ago | (#9231660)

I haven't tried tic98 (mentioned lower in this thread) but I can vouch for DjVu. I routinely scan notices, bills and whatnot mailed to me, then destroy them (rather than maintain a large paper file)

300DPI Black & White scans take about 19kb. They are quite readable, and with 300DPI information, make pretty good printouts.

Do what I do (0)

Anonymous Coward | more than 9 years ago | (#9231542)

Wank off for a bit

How about a tackling it differently (0)

Anonymous Coward | more than 9 years ago | (#9231544)

At my uni the course-capture guys started with the scanning approach. OFcourse it didn't work out since it is impossible.

Eventually they rigged up a system of sticking a cheap video camera in the class, and giving the prof a chalkboard capable of printing whatever he wrote on it. That would just get converted to a PDF (no conversion or OCR), and the taped course convered to an MPEG1.

If you can't handle the load, look at alternative solutions like the above. YMMV.

Mongo Fax (1)

onyx pi (689409) | more than 9 years ago | (#9231553)

Mongofax it to yourself. Will come to an inbox near you as an email with pdf attachment. No need for a scanner. Works as fast as your fax can chew through your docs.

color simplification? (0)

Anonymous Coward | more than 9 years ago | (#9231555)

the scanner might be acting like it's scanning a multi-colored photograph when reading in a hand-drawn lecture frame. try to see if the scanner can simpify colors or if your PDF maker could do it. Put another way, instead inputting 64000 possible resultant colors, use 16 or some other low count number, as typical lecture slides (and pens) only use a small number of colors (typically black, red, blue, green, and orange)

Recruit the community (5, Interesting)

SoSueMe (263478) | more than 9 years ago | (#9231556)

Do it the open source way.

Get several (dozen) other students to use their own equipment and time in echange for a copy/copies of the completed work.

I would hazard a guess that there are more than a few people who would like to have a copy of the complete series of the lecture outlines.

Easy (5, Interesting)

JensR (12975) | more than 9 years ago | (#9231557)

Get some students of the professor's course to type them into LaTeX. Give them some points they'd otherwise get for homework.
a) Publication quality DVI/PS/PDF files
b) The student can deepen their knowledge of the topic
Everyone happy. Used to work like this at the university I went to. And you may be even lucky that some student typed these notes in for himself.

DjVu format is pretty good for scanned docs. (3, Interesting)

artemb (2016) | more than 9 years ago | (#9231558)

I found that DjVu format produces substantially smaller file than PDF for the same scanned image.

There is an open-source project http://djvu.sourceforge.net/ that provides code for reading DjVu docs, but I have no idea where to get DjVu encoder.

froogle says... (1)

ZiggyM (238243) | more than 9 years ago | (#9231561)

I put "continuous feed scanner" in froogle, sorted by price, and found one for arround $400. You can do it 25 pages at a time with this. (Microtek X12USL 2400x1200dpi 42bit).

Re:froogle says... (0)

Anonymous Coward | more than 9 years ago | (#9231599)

thanks for the link! I appreciate it!


PDF is good (1)

Datasage (214357) | more than 9 years ago | (#9231562)

Where i used to work, we digtized 4-5 million documents per month. But these were mostly printed copies.

We had a set of high-speed sheet fed scanners, it would be then checked, and linked to a database. The documents in most cases where shipped to a vault.

Outsource it (0, Troll)

bshroyer (21524) | more than 9 years ago | (#9231564)

This looks like a job for cheap manual labor. Try India. Or an unpaid intern.

Don't you dare moderate this as a troll. You know as well as I do that this is probably the only viable solution.


If you had a budget.... (1)

nurb432 (527695) | more than 9 years ago | (#9231567)

Get one of those Canon scanner/copier/printer thingies..

They can scan direct to PDF at an amazing rate of feed using the standard sheet feed.

Since it has dual purposes, you might con them into one, shared among a couple of departments...

Xerox DocumentCentre (0)

Anonymous Coward | more than 9 years ago | (#9231571)

We have a 332 ST at work and recently added the scanning software to it; it can export PDF's (image PDF's no OCR stuff) straight to a FTP site. pretty nice. Of course this seems like you'd already have to have a documentCentre to begin with.

God be with you (0)

Anonymous Coward | more than 9 years ago | (#9231573)

I'm sorry to hear of your trouble. I offer prayers for you and your professors.

where to look (2, Insightful)

bcrowell (177657) | more than 9 years ago | (#9231575)

Have a look at the archives of this [upenn.edu] mailing list, which is mainly populated by Project Guternberg folks.

But the broader question is whether this is really a good idea. The result is going to be huge files, which will be messy, hard to read, and will lack an index or table of contents. Seems like a case of profs with too much ego and not enough willingness to put their own work into more useful form.

fax (1, Informative)

Anonymous Coward | more than 9 years ago | (#9231577)

Many people seem to forget that the cheapest, most common, and most reliable sheet-feed scanner is the old-fasioned fax.

Use the department funds to sign up an account at interpage.net [interpage.net] , which will allow you to fax stuff off to yourself and recieve it as an email attachement. Then use the fax machine in the office to run everything through.

That takes care of the scanning part; cataloging, organizing, and etc will take a lot more time.

You may be able to presuade some professors to fax you the stuff themselves, saving you a bit of time.

Scanner recommendation (1)

seanmcelroy (207852) | more than 9 years ago | (#9231581)

We use a Canon DR-3050 at work to do about 5,000 pages/week. It scans at 20 PPM, and you can put in a batch of about 75-100 pages and say 'go' and not worry about it. It's a $4,000 scanner, but it works really well for continuous processing.

As for formats, if it has handwritten stuff on it, you probably won't be able to OCR it and just store that. PDF image files are a pain, but so are lots of individual TIF's. Your students probably won't have a smart image viewere that can thumb through multiple pages of a multi-image TIF file, but if the prof's can mandate they download a free one somewhere, that'd probably be the way to go... even less proprietary than Adobe's PDF.

Not Uncommon (1)

kannibal_klown (531544) | more than 9 years ago | (#9231585)

It's not as bad as it seems.

At work, we have several multifunction printers / copiers / faxes / scanners. These things are huge, and take take reams of paper at a time for input, and don't take too long. Besides, it's completely automated (you might just have to import the resulting images into pdf which can be done easily). I've used it in the past to scan in my notes and worksheets the professor's handed out. It makes storage a lot easier.

Someone already suggested Kinko's. Yes, they might have it. Also, I've seen some smaller copying places in Newark have similar devices. So it's common enough that you can find it easily.

If a friend or contact doesn't have access to such a device, then I'd suggest paying a copy shop to do it for you. I doubt it would be that expensive, and you can bill the school for it.

The problem isn't that hard to solve (unless you want to try to do it in your apartment). But it's a good thing to bring up on slashdot, as many people might learn about this in case they need to do it in the future.

Xerox DocuShare (1)

Paladin814 (518257) | more than 9 years ago | (#9231593)

I company I work for has looked at a solution at a corporate solution for this very problem. After much research, we have decided to use Xerox's Docushare solution [xerox.com] with flowport. [xerox.com]

Basically you walk over to a Xerox copier with a sheet feeder attached and using a cover sheet created in flowport, scan in your documents into Docushare. They are stored as fairly high quality PDFs. The Docushare software also does an OCR on the files and then makes them text searchable.

Although not perfect, it is by far the best solution I have seen. It sounds like you do not have the funds to implement this at your school (the price of the Xerox copier and dedicated docushare server) but if you only have a limited number of these documents, then you would not need to have the infrastructure and perhaps Xerox would do this for you. Xerox has many offices in major cities.

Xerox's Flowport (1)

daigu (111684) | more than 9 years ago | (#9231594)

One option would be to use Xerox's Flowport. You would have to check what is available to you locally - but I can tell you that making a PDF with a Xerox copier and Flowport of 100 pages is a few minutes of work.

Also, try look for others are doing [unisa.edu.au] in the university setting.

HPs are cheap on Ebay (1)

fille (575662) | more than 9 years ago | (#9231601)

I just bought a HP ScanJet 6250C with ADF on ebay for 100 euros. I have not tried it yet but it scans all pages in the feeder (25?) after a press on the button. Some multifunctionals (fax, printer and scanner in one thing) have a feeder too and are much cheaper than a scanner with an ADF.

Try making GIFs (2, Informative)

PapayaSF (721268) | more than 9 years ago | (#9231610)

GIFs compress very well, especially with source material that's in limited colors. Try making a page into an 8-color or even 4-color GIF at about 150 dpi. The handwriting should be about as readable as the original.

Also, if you're scanning material with copy on both sides, you might get some visible bleed-through. Try scanning such pages with a sheet of black paper between the page and the lid of the scanner, then adjust contrast to ensure white whites and black blacks.

Volunteers? (1, Interesting)

Anonymous Coward | more than 9 years ago | (#9231611)

After you get all of it scanned it and put through OCR, there will still be a ton of mistakes you'll need to correct.

now, at this point, you'll likely start wishing that you live in Canada (if you already don't).

The key is in volunteers, to bastardize "1984". Get a number of fairly intelligent high school kids that haven't done thier 40 hours of community service (a graduation requirement).

Now, make them look at the originals, the scanned, and correct all the discrepancies

bonus: if the kids are the nerdy types, tell them that they're learning university material for free.

they could start paying you!

Handwritten!! (1)

sciop101 (583286) | more than 9 years ago | (#9231612)

The only handwritten stuff I saw professors use were in math/statistics classes and math-heavy engineering classes. Survey class professors lecture and test the same stuff every year. Go with October 30's advice.

Digicam (0)

Anonymous Coward | more than 9 years ago | (#9231617)

Use a digital camera and save as jpg. It's a lot faster than scanning, and the quality is just as good.

Re:Digicam (0)

Anonymous Coward | more than 9 years ago | (#9231649)

I would hate to be student who had to download the notes that way.

i like my fujitsu scanner... (1)

bbdd (733681) | more than 9 years ago | (#9231618)

i have a fujitsu scanpartner fi-4120c [fcpa.com] desktop scanner. only offers a page feeder, though, no scan bed, so you will need everything to be loose pages.

very fast, and will do both sides in one pass, if you are working with double-sided pages. at 200x200 resolution (you might need higher, ymmv) and scanning double sided pages, i get something like 3 seconds per page (counting one double-sided page as two pages). for software i am just using the included scanner driver and twain software and adobe acrobat.

cdw has it here [cdw.com] , i'm sure it can be had for cheaper. i got mine for $800 i think. a little more expensive, but the speed is well worth it in time savings.

No good answers AFAIK (4, Informative)

John Miles (108215) | more than 9 years ago | (#9231622)

I've run into a similar problem, and have no good solutions in the general case. I'm on a mailing list [yahoo.com] for users and collectors of Tektronix test equipment (oscilloscopes, logic and spectrum analyzers, and so forth). Last year, Tektronix's legal department issued a copyright release that permits the reproduction and distribution of documentation for test equipment that they (Tek) no longer support. This was of great interest to the people on the TekScopes list, because it gave a green light to scanning and trading/selling copies of manuals. I've scanned in a few manuals for some equipment I own, and it's a huge pain in the butt any way you look at it.

Electronic test-equipment manuals are pretty much worst-case candidates for scanning. In Tek's case, the schematic volumes often consist of hundreds of double-sided, nonstandard-sized foldout sheets (11x23" for example) with lots of fine detail that must be reproduced clearly. You can either scan the pages in segments and leave it to the reader to reassemble them, or you can take the manuals to Kinko's and have the foldout pages shrunk to 11x17" or 8.5x11" for scanning. Either way, it's a real hassle, and highlights a clear need for a "prosumer" duplex sheet-feed scanner solution.

A few years ago you could buy scanners like this one [ebay.com] that could handle arbitrary sheet sizes, but I haven't seen them in stores lately. These may be easier to use than flatbed scanners, assuming the precision they offer is sufficient for your application. I don't know how well they'd work on densely-printed schematics.

Other than bitching about the state of the scanner marketplace, I don't have much to suggest. There are a few hints that will improve the quality and usability of your final document:
  • There are other formats, like DjVu [lizardtech.com] , that have certain advantages over .PDF, but think carefully before using them. Will you be able to read your files 10, 20 years from now? In .PDF's case, the answer is an unequivocal 'yes' because of widespread government, military, and commercial standardization around it. I hate to see people spend hours scanning manuals in DjVu or another nonstandard format, because I'm 95% sure I won't be able to read them years down the road on a completely different platform.
  • To make the document searchable, use an OCR package like FineReader if possible... but expect to spend even more time babysitting the process.
  • Experiment with your scanner resolution settings to minimize the resulting .PDF file size. There's a big difference in size between 200 dpi and 300 dpi, and between a B&W and color scan.
  • For some mysterious, forehead-slapping reason, flatbed scanners often use glossy-white backing material in the lid. This encourages bleedthrough of text on the reverse side of double-sided material, making your scanned documents look sloppy and compress poorly. Placing a sheet of black paper, plastic, or cardboard material between your document and the scanner lid will make a big difference.

Outsourcing (0)

Anonymous Coward | more than 9 years ago | (#9231624)

Just pay some little kids (younger siblings?) like 3 bucks each to type it up.

Digital Copy machine (1)

Doppler00 (534739) | more than 9 years ago | (#9231647)

I don't understand why, but most people don't realize that most new copy machines are also PRINTERS and DIGITAL SCANNERS. I always find it funny when companies purchase fax machines/scanners/copy machine/printers when they really only need one device.

If you can find access to a digital copier at your university somewhere, you can just put the whole stack of paper in the sheet feed and it should be able to scan every page double sided and put it on a network drive somewhere.

It might take awhile to figure out how to set this up, but it's infinitely easier than trying to scan each page by hand using a crummy consumer scanner.

Use your students! (0)

Anonymous Coward | more than 9 years ago | (#9231648)

Don't put messy handwritten notes on the web. Its very unprofessional and looks rubbish. Ask for student volunteers to transcribe the notes into latex, then use html/pdf conversions for the web.

It'll take longer, but it will be worth the effort, especially when it comes to maintaining the notes in the future.

Solution --- New job! (0)

Anonymous Coward | more than 9 years ago | (#9231651)

'nuff said ;-)

Try using a camera. (0)

Anonymous Coward | more than 9 years ago | (#9231653)

At your budget, I'd get a digital camera (Nikon Coolpix on e-bay, for example), shoot the pages, and put the pages together with acrobat as pictures. Spies can shoot at speed, and I expect 3 secs/page might be a realistic guess.

Large Scale Paper to Digital Conversion (4, Informative)

felila (150701) | more than 9 years ago | (#9231654)

I do conversion for fun, at Distributed Proofreaders.

The problem is the mixture of graphics, equations, and text.

It's easy enough to turn a page of text into a smallish file. Get a good automatic-feed scanner ($3500 or so) and a copy of ABBYY OCR software. If the original isn't too speckly, tiny, or smudged, ABBYY will give you a 95% accurate text you can then correct. Best format to save in? Depends on what the school is going to do the files. If they're to be posted on web sites, perhaps XHTML. If it's just for preservation, plain text (if there's no Greek characters) or XML with UTF-8.

Equations -- well, there's supposedly a version of XML for math, but Distributed Proofreaders has ended up using TeX, as it seems to be the mathematical standard. While this would work for preservation, it wouldn't work for a web site.

For a web site, perhaps the best way would be to intersperse text with pngs of the equations and graphics. The pngs would still take a lot more space than text, but the files would be smaller than PDF versions of the whole page.

One solution (1)

jjohnson (62583) | more than 9 years ago | (#9231658)

At work, I set up a document scanning function for our BAR system (Business Approval Request)--everything that's submitted must include documentation, which is often a paper quote or invoice.

We bought an HP Scanjet with sheet feeder for about $200 (sorry, don't remember the exact model), and use Paperport to scan the documents to a network folder named for the person requesting the scan (the executive assistant does it). We save in 300 dpi TIFF files in 1 bit color (B+W), which are small (8.5" x 11" comes out around 50K), and extremely clear and legible, and can be printed out again at almost the same quality. The scanning is pretty fast, and it includes batches. The only slow part is that PaperPort (which comes with the scanner) scans to MAX files, which need to be saved as TIFFs.

A Fujitsu scanner, SANE and Quartz Python bindings (5, Informative)

sabi (721) | more than 9 years ago | (#9231665)

Such as the fi-4120c [fcpa.com] is what I'd recommend. You might have to stretch your budget a bit. The cheap HP sheet feeders are very unreliable; we went through two HP 5550c's enduring constant paper jams before switching to a better (Fujitsu) scanner.

Unfortunately you don't have much use for something like Acrobat Capture because you have handwritten notes to deal with. To process the files, SANE [sane-project.org] and/or TWAIN interfaces are reasonably easy to write code for. The cool thing about SANE is that you can run the saned daemon on any Mac or Linux box, and with a couple of lines of config file changes, it's instantly available over the network from any Mac, Windows, or Unix box (there are TWAIN bridges for Mac [ellert.se] /Windows [ozuzo.net] so it even shows up in Photoshop and so forth); there are also standalone GUI clients like XSane [xsane.org] .

I wrote a document management system in Python/wxWidgets (for Windows) in about a month part-time, and it works very well. Either on Mac or Windows, PDF makes sense because of the ubiquity of the viewers, even if you lose a bit in compression compared to more optimized formats such as DjVu. On Windows you can easily embed the Acrobat ActiveX control; on Mac OS X you have native PDF support, Panther's Preview kicks ass, and there are several open-source PDF browsing components such as the ones out of TeXShop or Glen Low's Graphviz port [pixelglow.com] you can embed in your own app.

Given a choice I would probably pick the Mac to do this project, because of the wonderful Quartz/CoreGraphics Python bindings. You can just draw right to PDF, and place PDF files as if they were images; for example, here's a short script to rotate a bunch of PDF files (sorry, Slashdot destroys Python indentation):


from CoreGraphics import *
import math, sys

for inputPDFPath in sys.argv[1:]:
inputProvider = CGDataProviderCreateWithFilename(inputPDFPath)
&n bsp; inputPDF = CGPDFDocumentCreateWithProvider(inputProvider)
&n bsp; if inputPDF is None:
print >> sys.stderr, \
"unable to open '%s': perhaps is not a PDF file?" % inputPDFPath
outputContext = CGPDFContextCreateWithFilename(
inputPDFPath + '-rotated.pdf', None)

for pageNumber in xrange(1, inputPDF.getNumberOfPages() + 1):
mediaBox = inputPDF.getMediaBox(pageNumber)
rotatedBox = CGRectMake(0, 0, mediaBox.getMaxY(), mediaBox.getMaxX())
outputContext.translateCTM(0, rotatedBox.size.height)
outputContext.drawPDFDocument(mediaBox, inputPDF, pageNumber)
You could also use ReportLab [reportlab.org] , but because a lot of the PDF processing code is written in Python it's somewhat slower and memory-hogging for high-volume use. (I used ReportLab on Windows for the above project, and use CoreGraphics Python bindings for my research, so I do know what I'm talking about mostly :)

There Are Ways Other Than Outsourcing (1)

Cprossu (736997) | more than 9 years ago | (#9231691)

but not for $200 you can get a Canon DR-2080C off of ebay for $630 and it can accept both usb2 and scsi-II interfaces

Try a panasonic (1)

big tex (15917) | more than 9 years ago | (#9231693)

The HP one you picked looked ok, but feeder looks a little chitsy.

We have a panasonic at work, and use it to scan in design packages. it's something like the model KV-S7065C [tinyurl.com] Don't be fooled by the 'low volume' tag - we routinely make 100 page pdf's out it (high volume = insurance office), even though it will take a few min. Thing works great. Highly reccomended. The panasonic comes with software that allows you to save all as a single file, break into xxx page long files (where you get to pick xxx), and many other features.
My favorite is that it makes it easy to create pdf's with changes in page size / resolution. Our packages are mostly design calcs (8.5x11, 300dpi) with a few drawings (11x17, 600dpi), and it works slick.

We used to send out ~5-10 fedex packages a week, but now we just scan and email. Saves so much money, time, and they can get packages right away.

A good way to keep down on the cost is to get a B&W scanner - you probably don't need color anyway, and it keeps the file size way down.

digital camera? (1)

Rob Bos (3399) | more than 9 years ago | (#9231696)

For that price, a digital camera on a fixed mount might be easier than a scanner. Lay out the sheet, take a shot, lather, rinse, repeat. Generate a PDF using imagemagick/ghostview.

My dad's office (5, Informative)

pavera (320634) | more than 9 years ago | (#9231698)

My father is an attorney,
he has a couple of high speed scanners from panasonic. They cost less than a thousand dollars (4-500) if I remember correctly, they scan at about 20 ppm, and the software that came with them will save each scanned group of pages as a separate document (pdf, tif, whatever). My dad uses this setup to scan all of the files that his cases generate (shrinking his document storage from about 1000 sq ft to 2 shelves in a bookcase). we are talking files that consist of 10,000+ pages, and normally he saves a years worth of cases on 3-4 cds. They can scan up to 500 pages at a time.
Here is a link:
High Speed Scanners [panasonic.co.jp]

Outsource it and save yourself the trouble (1)

BiteMyShinyMetalAss (444575) | more than 9 years ago | (#9231699)

I worked on a similar project in the past, where I had to PDF a lot of paper-based documents.

A nice ADF scanner will save your sanity. We had a newer ScanJet, resembling the 5550c, where you couldn't feed too much at once, and it would jam up, We later got a hold of an older HP Network ScanJet that worked like a champ. If I could remember the model numbers, I'd give them to you. :(

That said, from the sounds of your situation, outsourcing would be the best solution. They already have the high-end scanners, they high-end software to work with your documents, (i.e. Acrobat Capture) and all you'll have to worry about is giving them the documents, and picking up the CDs with the PDFs on them. I don't remember what it cost us, but I'd wager that the overall value was superior.

Good luck!
