Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Simple Document Imaging for Unix?

Cliff posted more than 10 years ago | from the reducing-the-amount-of-loose-paper dept.

Data Storage 47

andylievertz asks: "I have developed a logical system of directories for storing my digital documents (i.e. *.doc, *.mp3, *.gif, etc.), and can usually find any obscure document with relative speed. These 'must-keep' hardcopies include everything from bills and shipping invoices to brochures and chinese-food menus. I've tried applying my electronic filing techniques to an actual, real-world filing cabinet, complete with folders and labels, but such a system: requires a great deal of effort to maintain relative to the electronic system, especially considering the frequent influx of new hardcopy material; and doesn't address the greater issue of reducing the sheer paper bulk, organized or not. What solutions have you, the Slashdot Reader, employed to solve this situation for yourself? Are there viable Unix-based Document Imaging packages, similar in function to the Microsoft Document Imaging utility packaged with Office? Do you use a Unix-based Document Imaging solution personally or professionally? If so, what package, and why does it work for you?"

"So, step one is to find ways to reduce the influx of hardcopy (i.e. electronic billing, etc.), but for me, the second step is to find and utilize a [Unix-based!] system that will allow me to scan and file hardcopies electronically so they may be indexed, searched, re-organized, shared, and retrieved as easily as their electronic counterparts. Naturally, any such system would need tolerances for multi-paged documents, and would need to store its output in a non-proprietary file format."

cancel ×

47 comments

Sorry! There are no comments related to the filter you selected.

How often do you really need to look at old bills? (4, Interesting)

DeadSea (69598) | more than 10 years ago | (#7603667)

I just throw then in a box. Once a year, I search down for anything more than a couple years old and throw away anything beneath it. Other than a tax audit, I can't imagine I would ever need to look at them after they have been paid. Same thing for receipts and invoices, they go in the box.

If you want to track money, having the paper is not nearly as useful as entering the data into a financial program. Try GnuCash [gnucash.org] or something of that ilk.

Delivery menus are different story. I keep them under a magnet on the fridge. If you get a nice rare earth magnet [ebay.com] that can hold a half inch stack of menus, that problem is easy to solve (get at least the half inch cubes).

Any solution that requires every document to be scanned is not going to work for you if you can't even file the documents. what are the chances you are going to get around to that stack of stuff to be scanned?

Invest in a magnet, a big box, and a good paper shredder.

Re:How often do you really need to look at old bil (1)

andylievertz (687492) | more than 10 years ago | (#7605842)

The rare earth magnet sounds good but what I need now is a refrigerator that I can ssh into and have it read me the contents of the menus ;o) Seriously...I can think of a number of times where I have a simple piece of paper on my desk at home, and if I could only retrieve it remotely (i.e. from work) than I could save myself a lot of time. A good, recent example: I just got an eye exam, and needed to fax a copy of the prescription to the [separate] optics store to get my new lenses. If I'd had a scanned image on my server (or some other universally-accessible storage solution), I could have pulled the image down and sent it over to the store in about 10 minutes. But because it was a physical piece of paper, I had to retrieve it from home and wait until the following day to get it to the optician. Maybe I'm trying to be too speedy?!?! I'm sure there are better, more crucial situations out there anyway. Andy

Re:How often do you really need to look at old bil (1)

pbhj (607776) | more than 10 years ago | (#7616961)

Other than a tax audit, I can't imagine I would ever need to look at them after they have been paid. Same thing for receipts and invoices, they go in the box.

Funny, not hilarious, we were burgled (sp?) recently - the insurance company wanted original documents for all the items taken, original receipts ... I had most of them, going back about 5 years, and even knew where they were ... this is probably 'helped' by the fact I don't have much disposable income.


Thing is the insurance folk probably wouldn't accept a scanned image (even if it was third party verified in some way!?).



No solution for you but.. (3, Interesting)

adamy (78406) | more than 10 years ago | (#7603673)

I've wondered about this myself.

Seems to me that the solution would involve a scanner, a database, and a mechanical system for retrieving the documents.

1) Scan the document.
2) Slide document into doc protector with ID tag (UPC codes might work, but really it could just be sequentioal
3) Create DB entry for ID, BLOB of scanned image, (or perhaps a foreign key to keep the images out of the quesry, but realistically most DBs optimize this for you) and most importatntly, meta data about document.

The more I think about it, the more I realize a number system of 1,2,3,4...would work fine. The automated retrieval, which would be nice, is not really vital. The match between the doc ID and the scanned version is enough, so long as the document always goes back into the same folder.

Insertion O(1)
Search O(log(n))
Deletion O(log(n))

Note that garbage collection (compation is not really an option, which means to reaclaim discarded IDS (Reuse folders would crank insertion back up to O(log(n))

The question is whether the scanning process would be worth the time.

Re:No solution for you but.. (1)

adamy (78406) | more than 10 years ago | (#7603702)

One follow up: You probably don't really need a scanned image for everything. Just make it optional. 1 to N where N can be zero probably makes the most sense, since you want to scan certain doscs page by page.

Why bother? (2, Informative)

tibike77 (611880) | more than 10 years ago | (#7603680)

What's wrong with the "folder in folder in folder" approach I use?
I don't really need a "system" for that... just make your "root" folders explicit enough, then file everything where it should go.
I even have a "temp" dir for every category.
I don't really see the need for such a tool, IF you can spare a few seconds to browse&dump...

Re:Why bother? (1)

dan_polt (692266) | more than 10 years ago | (#7609931)

I would suggest that the camera may not be quite the time saving device you would hope for.

The flash on a camera needs to be exactly right to produce a good readable image. Ok, so a scanner is slightly slower, but you will get a good quality document copy every time, which is suitable for use.

Afterall, if the image isn't captured to a good quality, there is little point capturing it at all.

Use a digital camera for input (3, Insightful)

G4from128k (686170) | more than 10 years ago | (#7603775)

I have found that a digital camera does a very good job of quickly capturing usable images of paper documents. A 5 megapixel camera provides over 200 ppi for 8.5 x 11 hardcopy and grabs the image faster than does most flatbed scanners. Given the scarcity of drivers for Unix, the only trick is finding a memory card reader that is compatible with your system.

A good digital camera may seem like overkill for scanning in bills, but then the camera also doubles as a camera too. ;)

Re:Use a digital camera for input (1)

zmedico (565341) | more than 10 years ago | (#7607161)

Given the scarcity of drivers for Unix, the only trick is finding a memory card reader that is compatible with your system.

In the case of Linux, the USB Mass Storage [google.com] drivers work pretty well form many types of hardware.

Re:Use a digital camera for input (1)

zmedico (565341) | more than 10 years ago | (#7607178)

I forgot to mention Gphoto [sourceforge.net] which runs on many flavors of Unix.

Re:Use a digital camera for input (0)

Anonymous Coward | more than 10 years ago | (#7607840)

The trick with the digital cameras is not to be dependent on a USB cable or any other cable-like connection for getting at your photos.

Memory card readers that are compatible shouldn't be too tough. Just make sure the memory cards aren't using some proprietary encryption or something and there should be a wealth of adapters available. Also, don't underestimate the power of something like installing a PCMCIA slot in your desktop (if you don't have a laptop with one). A $10 CompactFlash-to-PCMCIA adapter puts your CF card into your Linux system as a mountable filesystem (albeit most likely a VFAT fs or something).

Re:Use a digital camera for input (1)

sakusha (441986) | more than 10 years ago | (#7616512)

Digicams suck for general input. I'm doing a ton of digitizing of old tabloid (11x17) magazines with my Canon Powershot s50 5MP camera. That gives me a 240DPI image at a bit under 8.5x11in. This would obviously be a stupid idea except that 11x17 scanners cost about $3000, so a $500 camera to do a quickie low rez scan is about the best I can afford. At least I can read the tiniest type at that rez, that was the deal-breaker. I really should be using the new Kodak 16Mp camera for this job, but I can't afford it.

Anyway.. there are huge problems with this process. I have a huge old photo enlarger, I removed the projection head and replaced it with a camera mount, as a quick and dirty copystand. It is exceptionally difficult to get the camera perfectly aligned and levelled so your scans are square instead of skewed. I use tungsten lights (I'm just doing B&W copywork) and they get really hot. But worst of all, in NO way can the process be described as speedy. It takes about 60 seconds to store a raw image file to disk over USB2. You really have to use a remote capture program, I use Canon Remote Capture on my Mac, it shows what the viewfinder is seeing, but remotely on my Mac display. This is critical for lining up your page so you don't get crooked scans. I tried just using the shutter release instead of the remote release, but it didn't save any time, the transmission time over USB2 is insignificant compared to the processing speed of the camera. It takes almost exactly the same time to store the image to a CF card as to the remote CPU via USB.
For final storage, I convert everything to PDFs. They're easy to access on any platform, and image compression using Acrobat Pro is excellent (and configurable). I use the default compression, I just scanned a 72 page magazine, autoprocessed them in Photoshop, saved as greyscale 8 bit TIFFs at about 5Mb each, and imported into a PDF. The 72 page PDF is about 30Mb, the originals were about 360Mb, about a 90% reduction in file size. Nice.
Anyway, I use a Mac and commercial software products, you could easily duplicate this workflow with other *nix or open source products.

Apple Unix? (1)

eyeball (17206) | more than 10 years ago | (#7603797)

Does Apple OSX count as UNIX? I use iPhoto which comes with OSX.

Ok, so it's not available on any other unix platform, but it employes a nice design for storing images that takes advantage of simple UNIX symbolic links. All images are stored in a hierarchy based on the import date. Then, Albums are created, which contain symbolic links to the real image files.

paymybills.com (1)

Triumph The Insult C (586706) | more than 10 years ago | (#7603898)

i haven't used them in about 2 years, but it was a pretty slick service. i was one of the first X to sign up (i think 1000), so i got it for free

basically, you tell the peeps billing you to send your bills to pmb.com. pmb.com scans in the bills, and you could download the scan in pdf and do whatever with it. then, you could pay your bills ... eg, pay $50 of phone on 12/4, etc.

when i moved, i lost interest because i could pay all of my bills online (i couldn't when i was using pmb.com), but having it all in one place was definitely nice

i'm not sure exactly how they made $$ on the deal, but i think it was when you'd pay part of a bill, they'd stick that in a bank account somewhere, earn like $2 interest after 3 days, then pay your bill. not sure tho

Re:paymybills.com (1)

ivan256 (17499) | more than 10 years ago | (#7604244)

i'm not sure exactly how they made $$ on the deal

From paymybills.com: This is no longer an active web site.

Looks like they didn't.

Re:paymybills.com (1)

splattertrousers (35245) | more than 10 years ago | (#7604462)

I'm not sure exactly how they made $$ on the deal

From paymybills.com: This is no longer an active web site.

Looks like they didn't.

They were bought by PayTrust [paytrust.com] . They make money by charging $13 a month per user.

Just to make it clear how spectacularly cool this service is: it's not just online bill-paying. All your bills go to them (either electronically or through the regular postal service) and show up on the web. You can even have them pay the bills automatically. It's dang convenient.

Re:paymybills.com (1)

macdaddy (38372) | more than 10 years ago | (#7615789)

I do this with my Bank of America [bankofamerica.com] account, for free. They even have provisions for receiving e-bills for companies that support it and automagically paying your bills for you. Nice. I've been thoroughly impressed with their web services in the month or so I've been a customer. I have the My Access account. They make $$ by charging you if you took up the time of one of their tellers too often. I can visit a bank teller something like 3 times per bill cycle. Customer service visits don't count. Basically the only time you get billed is when you make a deposit in person instead of via an ATM, withdraw $$ in person rather than at an ATM, or transfer $$ in person rather than on the phone or at an ATM. I've been impressed and I'm not easy to impress when it comes to banking.

Re:paymybills.com (1)

splattertrousers (35245) | more than 10 years ago | (#7623413)

I do this with my Bank of America account, for free.

I looked at their site and it looks like you can only receive e-bills through it. PayTrust lets you receive paper bills too; they scan the bill in and show it to you online.

It's lame that all billers can't send e-bills, but at least PayTrust can make all bills seem like e-bills to me.

Re:paymybills.com (1)

macdaddy (38372) | more than 10 years ago | (#7623827)

Yeah, that's a downside. I don't mind actually getting my bill though. I really don't know how a company would handle sending a bill to your bank instead of your home address. It's not like you're changing your own address to that of your bank's, yet as far as the company that's billing you is concerned maybe you are. Hmmm... Makes me wonder.

Re:paymybills.com (1)

splattertrousers (35245) | more than 10 years ago | (#7625023)

I really don't know how a company would handle sending a bill to your bank instead of your home address.

Most companies can handle it pretty well. I just tell them that my billing address is such and such, but that my home address is such and such. I've come across two companies so far that can't deal with it (one tiny company and one huge company), but the rest have had no problem. I guess it's not too uncommon to a person's billing address to be different from their home address or service address.

Re:paymybills.com (1)

macdaddy (38372) | more than 10 years ago | (#7625921)

I've done that a lot when I purchased something and had it shipped to work while billing me at my home. I hadn't thought about ever doing that with a company providing a service like AC or my landline. Interesting. I wonder what Farm Bureau might think of it. LOL I don't have a need to try it now but I'll have to keep that in mind in case I ever do.

Re:paymybills.com (0)

Anonymous Coward | more than 10 years ago | (#7604554)

If you live in Canada just sign up for ePost [epost.ca] . It's Canada post's answer to this. It's free and government run so your bills are hopefully in better hands. (cross your fingers on that one). Anyway it works great for me and vendors like to use it because it's cheaper for them.

how about... (1)

pgaffney (247103) | more than 10 years ago | (#7604020)

Get a label gun.

Put each document in white business envelopes, numerical labels on each white business evelope, put document inside as it comes in. Put envelopes in one of three boxes; never throw away, throw away in five years, throw away in a year. Maybe have additional box for documents that will only fit in big manilla envelopes.

Write a quick perl script webinterface that records one of several customizable options from a pull down menu (ie grocery receipt, gas bill, heroin expenses) along with the date, the numeric label you just put on, and maybe a sentence additionally describing content as needed. The web interface calls an SQL database and stores the information therein or according to the scheme for your electric system you seem to find awesome.

When you need to find a document, search by type and date or text note, find the label number and box number, and proceed as you might typically.

The only trouble you're going to run into is that evelopes are expensive and take up more space than you might like. Try only using envelopes for stuff not printed on 8.5 by 11; put labels directly on stapled folded versions of these. Alternately, divide files by size of paper and paper weight in a serious and merciless fashion. Alternately, reuse envelopes. Anyone have a better system for actually storing the paper assuming you like the sequential numbering + database scheme?

PaperPort (1)

humanasset (206242) | more than 10 years ago | (#7604565)

I was looking for something similar that was Open Source or Linux based. Unfortunately, I wasn't able to find anything to my liking.

I settled on PaperPort for Windows. It allows a folder hierarchy to organized your scanned documents that can be altered to your liking. In addition, you can use the application's basic OCR capability to search the contents of all your documents. The previous version, PaperPort 8, used a proprietary file format. But the new version, PaperPort 9, has changed the default file format to PDF. I'm happy with it, and it reduced the mountain of papers on my desk to a small, manageable pile.

http://www.scansoft.com/paperport/

I use... (4, Informative)

sydb (176695) | more than 10 years ago | (#7604853)

QuiteInsane [sourceforge.net] .

Its insanely good. I use it to scan in all my important documents. It useful multipage modes for... well, multipage documents.

Try it. It's actually been considerably revamped since I installed it, I will have to try a more recent version,

Oh, it comes in a nice debian package via apt-get.

Re:I use... (1)

andylievertz (687492) | more than 10 years ago | (#7605884)

Thank you! Thank you! Thank you! Your reply is helpful...informative...friendly, and I appreciate that your response wasn't something like "Why bother?"

If I had moderator points (and could moderate this thread) I would give them all to you. Moderators please mod up. :o)

Andy

Re:I use... (1)

sydb (176695) | more than 10 years ago | (#7612758)

Glad to be of help! I was a bit surprised by the "why bother" posts myself. The free software community, in my mind, is about people helping each other, not belittling the needs of each other.

Why bother with a "why bother" post?

I have a three tiered system (1)

mike_scheck (512662) | more than 10 years ago | (#7604910)

I have developed a three tier solution for documents. I have a fax machine that I use to fax my documents to/from. This is connected to a documentum docimage system, that stores the images in an oracle databse.

I was thinking about installing a java front end to use webdav to connect to the db to allow me to access the documents through a webpage, but I'm not sure if I'll go through with it or not, I want to keep it fairly simple...

I'm kidding of course, I have a trash can in my office that my girlfriend loves to throw all my important documents in.

Re:I have a three tiered system (2, Funny)

Carnildo (712617) | more than 10 years ago | (#7605022)

I have a trash can in my office that my girlfriend loves to throw all my important documents in.

Which important documents? The ones from Playboy?

Simple? (0)

Anonymous Coward | more than 10 years ago | (#7605233)

Dude, nothing is simple on Unix.

Re:Simple? (1)

andylievertz (687492) | more than 10 years ago | (#7605900)

You can't blame a guy for being hopeful :oD

Kooka (1, Interesting)

Anonymous Coward | more than 10 years ago | (#7605928)

I've tried Kooka [kde.org] and should be good for simple stuff.

Omnipage (1)

austad (22163) | more than 10 years ago | (#7606084)

There's always Omnipage for OS X, but it's 500 bananas. I wish they sold something that was more in the price range of the normal consumer.

some thoughts... (1)

sribe (304414) | more than 10 years ago | (#7606342)

A long time ago on an ancient Macintosh I used HyperCard to deal with this. My particular problem was with technical papers and marketing literature--no matter how carefully I set up categories for filing the categories would need to change over time. So I went to a system of simple numbered folders 1-n for the physical files, and an electronic "index" that consisted of 1 HyperCard card per folder with a title field giving the theme of the folder and and a description field with more info on the actual contents. This worked very well as long as all these materials were hardcopy; then the WWW came along and I don't want to print out everything just to be able to find it again.

I went back to trying to categorize physical folders and hate this system. I'm disciplined enough that I can manage to keep a coherent system and be able to find things, but it takes so much more effort than it should. I've considered using one of the many nice outliners available for the Mac to try to manage this (NoteBook, NoteTaker, Omni Outliner, Hog Bay Notebook, TinderBox, DevonThink, NovaMind, Boswell). Although this appeals to me because I really like outlines and keep a lot of project info this way, it would still be too much work. So I'm thinking of going to a system that's all electronic. Much of my material already comes off the web anyway, and so can easily be captured into PDF format. For the rest I'm considering going to the effort to scan the paper

I've thought of putting together a simple database: something very much like my old HyperCard stack, but with a web frontend--easily slapped together with PHP some weekend. But what's holding me back is that there is a pretty large quantity of documents that fall into an obvious folder organization, and I'm not sure I want to move those off into the serialized id 1-n organization and be required to use a database to find them, when they're already in an obvious (to me only, of course) location on my disk. Yet I also don't particularly want to have 2 different systems for finding different classes of documents, when I know that the classifications will overlap and shift.

As for scanning under UNIX, there's a company [peabody.com] that makes for various UNIX platforms commercial tookits which support the kind of scanner you'd want for this (high-speed, no need for super-high resolution). I have recently developed partial (meaning I only support the scanner features that I needed: black & white scanning using the USB port) support under OS X for a perfect (if you can afford almost $1k) scanner [fcpa.com] . I hope to put up a web page and give away a command-line utility (multi-page TIFF output) to run the scanner some time soon.

Re:some thoughts... (1)

andylievertz (687492) | more than 10 years ago | (#7607852)

I'd love to get away from hardcopies alltogether, and in doing so, i think I could come up with a folder heirarchy/filename structure that would make it easy enough for me to find things...but there are some things I'd like to file in two places. So one option would be to use symlinks...but I can see some advantages to the database design you suggest.

Perhaps your database could hold (a) image title (b)the image (c) the OCR text from the image, for searching purposes and (d) several "category" fields. For example, a credit card bill:

Title: Chase Manhattan Visa Statement, December 2003
OCR: credit, visa, , etc., userful terms
Category 1: bills
Category 2: December 2003 expenses
Category 3: visa_credit_account

Maybe the Title field would be searchable too, with a higher priority on title-based matches.

???

Re:some thoughts... (1)

sribe (304414) | more than 10 years ago | (#7610456)

I'd love to get away from hardcopies alltogether, and in doing so, i think I could come up with a folder heirarchy/filename structure that would make it easy enough for me to find things...but there are some things I'd like to file in two places. So one option would be to use symlinks...but I can see some advantages to the database design you suggest.

Perhaps your database could hold (a) image title (b)the image (c) the OCR text from the image, for searching purposes and (d) several "category" fields.


Good ideas; the kinds of things I keep thinking about from time to time...

Windows XP has the right idea (Re:some thoughts... (1)

katz (36161) | more than 10 years ago | (#7614570)

I've given this problem much thought. I have files I'd like to index such as articles collected off the 'net, po^Hrogramming mpegs, and images. I was looking for a system that is
(1) integrated
(2) flexible enough to let you set and query by keyword AND key/value pairs
(3) transportable (so that copying a file to another disk moves transfers its indexing info as well).

So far the best interface I've seen is Windows 2000/XP's index service. It's an extension to the Find utility that lets you query files very finely (actress=jenna, date>2002, keywords=physics lectures). You set these attributes through a dedicated tab in a file's context menu. I wish there was something like this for KDE or GNOME -- but then, this would not be easily transportable (not unless you're willing to consciously remember to manipulate the metainfo database or move the file's metainfo file along with it).

Roey

Document Imageing (1)

redog (574983) | more than 10 years ago | (#7608632)

I currently use Westbrook Technologies' filemagic [westbrooktech.com] it only runs on windows, requires msde, and probably is not cost effective for an individual, but I know filenet [filenet.com] runs on solaris, and who knows [nobody-kno...ything.com] mabe it costs less.

FileNet... (1)

DrCode (95839) | more than 10 years ago | (#7622537)

I worked for them around 1987, when they used to build their own Unix workstations. They have a nice product, but most of their customers seemed to be large banks and insurance companies.

Webcam for scanning documents (1)

Frans Faase (648933) | more than 10 years ago | (#7608673)

Theoretically, a webcam should be sufficient for scanning documents. So far, I haven't found a package able to do this. This package should combine multiple (still) frames into a single frame for noise reduction, or possible even stick multiple frames into a single frame, and rescale the whole images to the paper size. Would it not be nice to pick your webcam and move it over the document and have an instant scan of it?

HP Digital Sender and htDig (4, Informative)

em.a18 (31142) | more than 10 years ago | (#7609140)

I use an HP Digital Sender (http://h10010.www1.hp.com/wwpc/us/en/sm/WF05a/151 79-64175-64404-12126-64404-25324.html)
and htDIG to solve all my document storage problems.

The Digital Sender is a wonderful toy. Stick a stack of paper in the bin. Enter an email address. Press the big-green button. And a PDF shows up in my mailbox in a few minutes. Even does double sided. Very simple device and it does most of what I need.

It doesn't do OCR. The Digital Sender outputs a bit-mapped PDF that looks very good. I usually use the full version of Adobe Acrobat to do optical character recognition and store the results in the background. That way I still see the good scan on the screen and when I print. But I can copy and search the text as I would normally.


I use htDig (http://www.htdig.org/) to index my archive. I store content in file folders that make sense (2002 taxes, pitch perception papers, etc). But I still find htdig useful. It indexes both HTML (my lab notebook) and PDF files. All is good.

PDF is a well-documented file format. I wish there was a good free-OCR package, but sometimes you have to pay for good performance. htDig and PDF work great on Windows and Linux.

In three years I have accumulated just over 1Gbyte of content. That represents all my lab notes (in HTML format) and all the papers I've read (in PDF). It's wonderful having my entire paper life with me on my laptop. (I also back it up to three different machines.)

DocMGR looks like what you want... (1)

dloflin (110712) | more than 10 years ago | (#7613559)

Haven't tried it yet, but it looks like it fits the bill. Personally I just wish Google would open source what they use for their Catalogs site...but maybe DocMGR or something similar will do the trick...

It's at http://docmgr.sourceforge.net/ [sourceforge.net]

sane + ghostscript (2, Interesting)

mercuryresearch (680293) | more than 10 years ago | (#7616086)

Someone mentioned .PDFs. This was my solution

I have a very simple script that runs scanimage, then processes the output through convert to make it a rasterized postscript output, then processed that output through ps2pdf (part of ghostscript).

My scanner (epson 1640U) has a document feeder so the command line options for scanimage reflect that. A simple loop in the script handles all the pages.

The net result is a script called "scan2pdf" that I just specify the output PDF file name (something helpful, like the name of the document and the date). I've processed over a decade of financial records, easily 1000s of pages, in a day with this simple setup.

Re:sane + ghostscript (1)

andylievertz (687492) | more than 10 years ago | (#7617541)

That sounds fairly ideal, except perhaps the part about the document feeder...I would presume that most people don't have that kind of equipment.

I love that your solution is command-line based. Could your script be modified to handle multiple page documents, fed one at a time (some kind of a pause-while-you-change-pages)?

Would you be willing to share your script here? Thank you....

Andy

Yippee! (1)

macdaddy (38372) | more than 10 years ago | (#7619001)

I've tried to Ask Slashdot this question 3 or 4 times in the past handful of years and my submission was never accepted.

I really need this type of system. By far the single largest amount of clutter in my home has always been bills, other USPS mail that I need to keep (like mail from my 403B advisor), and recipients for a wide assortment of purchases. I've been looking for ths type of system for years. What I really want to be able to do is sit down in the evening with my bills in hand, pull up a software app that lets me choose which predefined company the document is for, toss it on my scanner and have all the settings correct from the get go, and automatically archive the images in the appropriate directories/database. I don't think I need OCR to be frank. I don't see a need for it, at least for my needs. I'd keep my archive in definitely since it's only a little drive space now. I'd write it to CDR once a month or less and drop it off in my bank's free safety deposit vault. Then I could afford to trash the mountains of paperwork that plague every corner of my home. I really need this.

I envision a simple web or application frontend that lets me pull up my bill by company and month (or entry day). Full text search might be nice but I think I could get by without it easy enough.

I'd like to also answer the first poster's question:

"How often do you really need to look at old bills?"

This truly depends on each person's situation. I do consulting work on the side so I have my own personal business. I file expenses under my Schedule C. I can declare all sorts of things to be expenses, including a percentage of my monthly utilities. I've always been told by those people with businesses that you should keep all business-related records for 15 years, just in case you get audited. If you get audited and can't account for ever penny then you're seriously screwed. I don't know what the rules are at the IRS about how many years back you can be audited. For all I know they might not even have rules. Just to be safe I'll keep my stuff indefinitely. I don't mind a GB of data a year if it covers my ass come tax time. Hard drives are cheap after all. :-)

One Big Folder (1)

GirTheRobot (689378) | more than 10 years ago | (#7622483)

- one big folder
- use highly descriptive and standardized filenames (ie "12-03-water-bill.tif")
- ls -l | grep water | grep 12-03

VIOLA!!! Always works for me!
You can't get any simpler.

Gallery? (1)

Havokmon (89874) | more than 10 years ago | (#7622669)

Why not use photo gallery software?

Check for New Comments
Slashdot Login

Need an Account?

Forgot your password?