Beta

Slashdot: News for Nerds

×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Ask Slashdot: Open Source For Bill and Document Management?

timothy posted about a year ago | from the seasonally-appropriate dept.

Data Storage 187

Rinisari writes "Since striking out on my own nearly a decade ago, I've been collecting bills and important documents in a briefcase and small filing box. Since buying a house more than a year ago, the amount of paper that I receive and need to keep has increased to deluge amounts and is overflowing what space I want to dedicate. I would like to scan everything, and only retain the papers for things that don't require the original copies. I'd archive the scans in my heavily backed up NAS. What free and/or open source software is out there that can handle this task of document management? Being able to scan to PDF and associate a date and series of labels to a document would be great, as well as some other metadata such as bill amount. My target OS is OS X, but Linux and Windows would be OK."

cancel ×

187 comments

I just thought of something (5, Funny)

roman_mir (125474) | about a year ago | (#43385563)

Send them to a dedicated gmail account. You'll be able to find all of your documents (you can label them, whatever) and they provide online office of some sort and if you forget what you have there you can always just go to Google search and push "I feel lucky" button.

Re:I just thought of something (4, Insightful)

Anonymous Coward | about a year ago | (#43385609)

Providing quick and easy access to the government (and who knows who else) to all of your important documents.

Re:I just thought of something (3, Interesting)

roman_mir (125474) | about a year ago | (#43385871)

Absolutely, no question about it. Some documents are not that important, but the important ones shouldn't go there.

Re:I just thought of something (2)

Rinisari (521266) | about a year ago | (#43385621)

I'm concerned with privacy of backing up to Gmail, even if its labeling is completely what I'm looking for. I suppose I could encrypt everything I send and base its subject on something I can read and label, but that's a lot of rigmarole for something that I really would rather keep locally or on my own backed-up network.

Re:I just thought of something (3, Insightful)

fustakrakich (1673220) | about a year ago | (#43385717)

Google is pretty fickle with its applications. We'll never know how long gmail will remain online, until they decide to shut it down.

Oh, like the other replies said, 'privacy'... You will have none if it is online in any form.

Re:I just thought of something (1)

Genda (560240) | about a year ago | (#43387171)

Not necessarily so... Google (or any cloud storage resource) is an awesome place to store encrypted and compressed documents. You just want to make certain that you back everything off every once in a while so if Google (or other resource) decides to pull the plug, you won't find yourself trying to slurp 5 GB of data down in a week through a limited resource being crushed by a hundred million other users doing the same.

I was in the same boat (2)

mkro (644055) | about a year ago | (#43385569)

I ended up with gscan2pdf and a rigid directory and filename structure. It works, but yeah, tags would be nice.

Re:I was in the same boat (2)

AvitarX (172628) | about a year ago | (#43385583)

Hasn't kde finally gotten their shot together for functioning tags?

Re:I was in the same boat (1)

fustakrakich (1673220) | about a year ago | (#43385741)

Hasn't kde finally gotten their shot together...?

Their aim is true...

Re:I was in the same boat (4, Informative)

tomtomtom (580791) | about a year ago | (#43387145)

I ended up with gscan2pdf and a rigid directory and filename structure. It works, but yeah, tags would be nice.

gscan2pdf is OK, but if you want to do this seriously then you're probably going to want a reasonably fast sheet-fed scanner (I got a Fujitsu ScanSnap S1500, which is supported by SANE and can scan at 18-20 pages/36-40 sides per minute) with a button so that you can go through a whole stack of paper quickly with minimal keyboard/mouse interaction to slow you down. This led me to setting up scanbuttond (which just gained official support for the ScanSnap but there was a patch floating around somewhere for a while before that) with a custom script.

Make sure you OCR your documents to make them searchable then run an indexer (I like recoll [recoll.org] but KDE and GNOME both have their own desktop search solutions as well). I've found the best OCR engine on Linux seems to be tesseract [google.com] , but there are a couple of others you can try. The process took me a while to get right and is a bit painful - the script which scanbuttond runs runs scanadf to scan to a string of image files per side and puts them in a processing directory. I then have another batch-processing script I run once I'm done with a pile of papers while I go and get a cup of tea which runs unpaper then tesseract on them, then hocr2pdf to convert each page individually into a searchable PDF file then finally pdftk to concatenate all the pages together into a scanned document. I split the two parts of the process out because the OCR bit can take some time and this way I can get maximum throughput on the scanner itself without needing to wait for the rest to catch up. If I could be bothered then I could make the scanning script run my de-batching script once only and have it pick up new files as they are dropped in the directory but it's not that much of an effort really.

I then sort my PDFs into a hierarchical directory structure once they've been OCRd (and at this point they get indexed as well for searching).

If you're on Windows/Mac then the software that comes with the ScanSnap will pretty much do all this for you; although it's better to scan with OCR disabled then use Acrobat to batch-OCR the PDFs later for the same reason. Add a decent desktop search solution like an old version of Copernic (or possible Windows Search) and all is good.

doxbox.ca formerly known as owl (0, Informative)

Anonymous Coward | about a year ago | (#43385573)

Subject says it all

Book Recommendation (0)

Anonymous Coward | about a year ago | (#43385575)

May I suggest reading David Spark's Paperless [macsparky.com] book?

It has a whole chapter on how to tag documents on OS X, which sounds what you're looking for.

OpenKM (3, Informative)

Anonymous Coward | about a year ago | (#43385585)

OpenKM (http://www.openkm.com/en/) is what I use to manage my documents, its tagging and document preview features are what I appreciate most. It runs as a web-service, FYI.

SANE (0)

Anonymous Coward | about a year ago | (#43385587)

I built my own document storage system years ago using sane, postgresql, and I think it was perl wor the web interface though there are much better ways to do the web part.

muddle headed post (2, Interesting)

Anonymous Coward | about a year ago | (#43385591)

by definition, "important" = keep original (I mean seriously, are u that short of basement space ??)
Electronics are ephemeral; You can, today, read stuff on papyrus, as long as you know the language..do you really want to trust stuff that is important to ephemera electronics ?
(i mean, how many times has /. gone over this - is this the editors idea of a yearly question ?)

tagging is an inherently stupid idea; it may be the best that you can do with current technology, buta google like full text search is much much better (tell me - if you want to pull out a piece of information you know is on your hard drive in a pdf, do you look for the pdf, or just google it ?)

it is possible,after 5 or ten years, you might know what tags you want....
tagging is hard work, that you have to do manually consistently; better to have 3 or 4 folders organized by client/project then tag

Re:muddle headed post (1)

techno-vampire (666512) | about a year ago | (#43385683)

Basements aren't as common as you think they are. I've always lived in Southern California, and I've never lived in a house with a basement. At most, there's been a crawl space under the house, but that's not exactly a good place to store things. And, I suspect there's a typo in TFS that the editor's didn't catch: it says, "I would like to scan everything, and only retain the papers for things that don't require the original copies." and I think it should read, "...that do require..." because as written, it makes no sense at all.

Re:muddle headed post (2)

Rinisari (521266) | about a year ago | (#43386039)

You are correct. I meant to keep only the things I need originals of: birth certificate, car titles, etc.

As for physical space, I have better things than documents to store in my available basement space: wine, beer, computers, etc.

Re:muddle headed post (2)

techno-vampire (666512) | about a year ago | (#43386105)

I don't even have the originals of my birth certificate, discharge papers or DD 214, and haven't in decades. However, my father registered my birth certificate at the Hall of Records, and I did the same with my discharge papers and DD 214 after I got out of the Navy so I don't have to worry. In fact, in Los Angeles, where they're registered, any veteran can get two copies of his service papers for free, any time they're needed, so why keep the originals? And, once when I was down there to request copies, I ran across my father's, although I've never had a reason to request them. Still, it's nice to know how long they hang on to things like that.

Re:muddle headed post (0)

Anonymous Coward | about a year ago | (#43386815)

Keep paper documents that require originals in a safety deposit box or a fire-proof and water-proof personal safe at home preferably hidden away. All other documents and copies of originals-must-be-maintained documents can be scanned into PDF and stored in a document management system such as OpenKM preferably running on a dedicated virtual machine which can be backed up weekly, encrypted and transferred to third-party storage or VPS provider or even to a relative's home network.

Re:muddle headed post (2)

ShanghaiBill (739463) | about a year ago | (#43386251)

Electronics are ephemeral; You can, today, read stuff on papyrus, as long as you know the language..do you really want to trust stuff that is important to ephemera electronics ?

This is just completely backwards. Electronic documents are the least likely to get lost or destroyed. I have no receipts or papers from 25 years ago. But I have all my email from those days. With e-docs, you can make multiple copies, store copies off-site, etc. Every email I have ever sent, every non-spam email I have ever received, all the source code I have ever written, over 10,000 family photos, copies of my marriage license, deeds, insurance forms, etc. etc. will ALL fit on a single XD card smaller than my fingernail, and the XD card will fit in a keychain fob that I carry in my pocket. Other copies of all these docs are on my laptop, on my desktop, at my parent's house, on a server outside the USA, on an SD card in a ziploc bag taped to the bottom of my will, etc.

(i mean, how many times has /. gone over this - is this the editors idea of a yearly question ?)

Apparently not enough. Every time it comes up, the general consensus is the opposite of what you recollect.

Re: muddle headed post (1)

DigiShaman (671371) | about a year ago | (#43386437)

I think the parent poster was referring to a theoretical future digital dark age. Societal collapse, EMP, etc. Honestly though, if any of the aforementioned were to occur, you have bigger problems to worry about. So ya, I wouldn't sweat it.

simpler = better (1)

Anonymous Coward | about a year ago | (#43385593)

I do this on Windows using the cheapest HP all in one with ADF with its bundled scan to PDF with OCR. I use an encrypted TC volume for storage. 512MB is plenty for several years worth at 300dpi b/w. The less typing you have to do the better. Just use one folder for each major category. House, Taxes, utilities, etc. Don't make yourself work too hard entering each item or you will never get around to scanning.

E.R.P. (0)

Anonymous Coward | about a year ago | (#43385595)

In the business this is often part of ERP software. ERP stands for enterprise resource planning.

An open example would be ERPAL
http://drupal.org/project/erpal

This again? (5, Funny)

turkeyfeathers (843622) | about a year ago | (#43385599)

Similar questions to yours appear here regularly. The consensus is that it's best just to throw the bills and documents out and spend more time watching porn.

Try Alfresco (2, Interesting)

Anonymous Coward | about a year ago | (#43385611)

You can try Alfresco DMS.
It requires a webserver so it might be too-much for a single user.

iDocument (2)

Idimmu Xul (204345) | about a year ago | (#43385615)

http://www.icyblaze.com/idocument/ [icyblaze.com]

iDocument for the mac is like iTunes but for documents. It lets you import documents (pretty much any type) and tag them and store them in virtual or real folders, it sounds like it's exactly what you're after.

Re:iDocument (1)

Rinisari (521266) | about a year ago | (#43386059)

Thanks for this. It's pretty damned close to what I want, the sole exception being that it's not open source and not cross platform. I might go in on it anyway if I can't find something better.

My Workflow (5, Interesting)

Orphaze (243436) | about a year ago | (#43385625)

1) Receive document.
2) Scan with Fujitsu Scansnap S1500 in about 10 seconds. $380 on sale, but so far worth it over cheap all-in-one scanners it's not even funny. Seriously, don't even bother going paperless unless you get a real document scanner.
3) Save PDF to simple software RAID-1 mirror of two 2TB drives. (Takes about 5 seconds to setup from disk management in Windows.) This should protect against sudden drive failure taking everything.
4) Backup nightly to external drive swapped off-site every other month. This should protect from accidental deletions, fires, etc. Bonus points if backup drive is ioSafe fire proof variety.
5) Throw away original. Only exception is official documents like titles, marriage certificate, etc.. Yes, I even throw away W2s and the like. My taxes are 100 percent digital nowadays.
6) Check and test restore from those backups on a semi-regular basis, and you're done!

Receive electronic statements? (0)

Anonymous Coward | about a year ago | (#43385695)

1) Receive document.

Most places I do business with offer electronic documents: billing, statements, etc ....

No need for a scanner.

Now the old stuff, well, of course you need a scanner.

Re:My Workflow (2)

spire3661 (1038968) | about a year ago | (#43385781)

I liked it up until you have windows managing a RAID. Get a RAID NAS running Linux. It seems odd to RAID up a couple of drives just to let windows mess them up. I suggest a Synology ds212. If you are really serious build a ZFS rig with snapshots.

Re:My Workflow (0, Insightful)

Anonymous Coward | about a year ago | (#43385979)

Man, fuck off and die. His solution works fine and you're gonna pick on it because he's not using your favorite kiddie OS? Newsflash: Windows software raid works, just like Linux software raid.

Re:My Workflow (0)

spire3661 (1038968) | about a year ago | (#43386163)

Yes the RAID system works, but the way windows handles transferring files SUCKS, especially if you are doing it via GUI. Has nothing to do with bias. I would never trust a windows machine to act as final storage for my critical files.

Re:My Workflow (0)

Anonymous Coward | about a year ago | (#43386463)

Oh really? I had Win7 running my HTPC and let Win7 manage my RAID. Yet every few months the RAID would break even though both drives were fine. Now I have an Atom system with RAID card, connected to Gb ethernet, in my attic, and my linux HTPC streams from that. No issues in a long time.

Re:My Workflow (0)

Anonymous Coward | about a year ago | (#43386033)

There's no reason for a RAID when you're doing nightly external backups. One recent day's worth of work is easily repeated. Second, a nightly backup is way over kill. Home owners don't get important documents daily. Think more on a monthly schedule.

1) Receive electronic document
2) Add to files

1) Receive physical document
2) Scan into computer
3) Store physical copy in loosely organized folder until the scanned copy gets automatically backed up twice. Then shred physical copy and recycle.
4) Automatically backup documents after XX amount of additions (you shouldn't delete or edit a scanned document). There's no reason to back something up unless it's changed. I recommend a full backup each time instead of incremental backups. Documents take little space and incremental backups have additional points of failure. A 2TB drive is massive for document storage. I use a 8GB thumb drive as a secondary backup for my really important files.
5) Once in a while extract a backup and double check that all it's documents match your active copies. You'll be ignoring any of the new files as they wouldn't be in the backup yet. You don't edit scanned documents, so the original files should always match every archived copy. If you ever start to run out of space on your backup drive and can't buy a new one, delete a random set of earlier backups. You're doing full backups so it's not too important which ones you delete as long as it's not the two latest archives. A better policy would be to buy a new HDD, copy everything over, and store the old dive someplace as an extra copy. When naming your files, include the date in YYYYMMDD format so that all operating systems will sort them correctly. You shouldn't trust timestamps.

KISS and keep it cheap. Expensive solutions tend to have more areas for faults.

Tip: Have a secondary backup drive that you only rarely touch and to keep only the most important items. I lost over a years worth of projects and documents because I lost power (transformer died down the street) during a backup to an external drive and both file systems became corrupted. Recovery tools helped a little bit.

Re:My Workflow (0)

Anonymous Coward | about a year ago | (#43386563)

"Tip: Have a secondary backup drive that you only rarely touch and to keep only the most important items. I lost over a years worth of projects and documents because I lost power (transformer died down the street) during a backup to an external drive and both file systems became corrupted. Recovery tools helped a little bit."

So an important lesson is to have your archive mounted readonly (except when adding new documents offcourse).

Re:My Workflow (1)

rastos1 (601318) | about a year ago | (#43385805)

I've been thinking about the same problem as the TFA recently. There is more to the task than simply scan and store.

What about assigning tags to the documents? Fast previews? Searching based on time/time range/tags/fulltext? Grouping related documents? Annotations? OCR?

I'm considering to write my own solution, but if there is something useful out there, I'd like to have a look.

Re:My Workflow (1)

rastos1 (601318) | about a year ago | (#43385821)

One more thing I forgot: electronic signatures.

Re:My Workflow (1)

sribe (304414) | about a year ago | (#43385861)

One more thing I forgot: electronic signatures.

What about them? For scanning and archiving, they're irrelevant.

Re:My Workflow (1)

rastos1 (601318) | about a year ago | (#43386023)

First, I want to be able after years to verify that the scan was not modified. Second: There are countries that do recognize electronically signed documents as legal documents (if signed with a certificate issued by state-run CA). I did not actually check with a lawyer if this fulfills the requirements, but ... why not to have the option?

Re:My Workflow (1)

sribe (304414) | about a year ago | (#43386141)

First, I want to be able after years to verify that the scan was not modified. Second: There are countries that do recognize electronically signed documents as legal documents (if signed with a certificate issued by state-run CA). I did not actually check with a lawyer if this fulfills the requirements, but ... why not to have the option?

For your own verification, OK. But no, no state-run authority is going to give any weight whatsoever to an image from your own archive that you signed yourself.

Re:My Workflow (1)

rastos1 (601318) | about a year ago | (#43386223)

I can get a certificate from sate-run CA. That means I can authenticate myself to a state-run service. Why not have a state-run service that produces a signature for a document (or encrypted document or document digest) that I upload? That would be a great service. It would even take some workload off the notaries (which would certainly make them very "happy") ... I should patent that.

Re:My Workflow (1)

sribe (304414) | about a year ago | (#43386675)

Why not have a state-run service that produces a signature for a document (or encrypted document or document digest) that I upload?

Because it says absolutely nothing about the authenticity of the document which you provide.

You're talking about scanned documents--documents from other sources which you allegedly scan, allegedly without modifying them, before signing them. The only authentication anybody else would be interested in would be authentication by the document producer, not by you, because you could perform any amount of modification/forgery before signing the document.

This is all very different from documents which you produce yourself, where authentication by you does have value.

Re:My Workflow (1)

Rinisari (521266) | about a year ago | (#43386017)

That's actually a good feature I'd not considered. As a document is added to the system, sign it using PGP and store the signature. That way, I have reasonable certainty that the document has not been modified since initial ingestion, or at least a warning that it may have been compromised if the signature doesn't check out.

Re:My Workflow (1)

Rinisari (521266) | about a year ago | (#43386009)

OP here.

These features you list are examples of what I desire in a package that manages documents. I'm not as concerned with OCR, but that'd be a nice feature to have for the lengthier letters and such.

Re:My Workflow (1)

motoservo (1327295) | about a year ago | (#43385975)

A used Kodak i1220 goes for about the same and a stack can scan documents or photos at a second per scan. And the photo scanning is on par with any of the cheap flatbed photo scanners that flood the market these days.

Re:My Workflow (1)

motoservo (1327295) | about a year ago | (#43386005)

Arg. Pardon the typo A used Kodak i1220 goes for about the same and can scan a stack of documents or photos at a second per scan. And the photo scanning is on par with any of the cheap flatbed photo scanners that flood the market these days.

Re:My Workflow (0)

Anonymous Coward | about a year ago | (#43386053)

Thumbs up on the Scansnap 1500. I've had one for several years and it's awesome. I'm a unix weenie, but to run the scansnap with its included OCR software, I have a VirtualBox instance running W2K (legitimate copies of which which can be had cheaply).

Vbox allows me to make a host directory visible to the guest windows instance as a network share, so I configure the scansnap software on the guest to write its files there. Then it's pretty automatic: put doc in scanner, push button, and it appears in my unix directory. From there I can move it to the appropriate place in my document hierarchy.

Some tweaking of settings for the scansnap driver and the included OCR software are needed to streamline the workflow, but it's possible to set things so that you can just feed docs in and have them automatically OCRed one after the other. Multiple document processing can be pipelined, but my processor (Q9400@2.7GHz) seems to handle only about 4-5 simultaneous OCR processes in the W2K instance before the scanner blocks. Hasn't been a big limitation for home office use.

Some minor nits with my setup (on FreeBSD host):
1. A few of the buttons in the scansnap software are activated only when I move the mouse cursor to the edge of the button. I'm guessing this is some quirk related to vbox. At first I thought the buttons didn't work at all, until I accidentally placed the cursor _just so_. Now that I'm aware of it, it's not a big deal, and doesn't affect most of the buttons.
2. USB detection on the host and corresponding signaling to the guest: the (Windows) scansnap driver and the hardware are more tightly-coupled than they should be, so if you close the scanner, the driver hangs. For startup/shutdown, my workaround is to always do the Windows action first, and then the hardware (scanner) action. This means: boot the guest first before opening the scanner; and shutdown the guest first before closing the scanner.

Re:My Workflow (0)

Anonymous Coward | about a year ago | (#43386389)

WTF!!!

I'm married 20 years, both adults working, with three kids, two properties and three cars and volunteer efforts at Church, Boy and Girl Scounts, PTA. I barely generate enough savable documents to warrant having one of those small two drawer filing cabinets.

Please explain what you have going on in your life to justify such a setup.

You don't need a CMS (5, Interesting)

Anonymous Coward | about a year ago | (#43385629)

So, I've been doing this pretty consistently for the past few years and sent this advice to some relatives asking basically the same question. (That's also why it's a little dumbed down.)

I haven't found a case where any sort of CMS makes more sense than the file system. This is after doing this for about 10 years, and I've got records going back to '01.

I'm using a Fujifilm Scansnap and a Fellowes Powershred, and running Mac OS X. OS X has decent indexing, a good file system manager (really can't beat column view) and the Preview app will let you reassemble PDFs, which is occasionally very handy.

1. The enemy is copies. I strongly recommend "scan and shred", or you'll wind up scanning the same thing over and over.

1.1. Don't bother with any scanner that doesn't do double-sided scans.

1.2. Use a shredder. You can take things out of a trash can.

1.3. The scanner should come with OCR software. Choose "Searchable PDFs".

2. Do scanning in small batches.

2.1. Create a folder "Scanned", and "Unfiled".

2.2. The scanned files go immediately into scans, and the paper immediately goes into the shredder.

2.3. After you've got a batch of stuff scanned, you move it into Unfiled and correct the names, or split the documents up as you need to.

3. If it takes any work to scan it just shove it in a filing cabinet, or, better yet, just shred it.

3.1. If you're having to use a flatbed, it's too complicated to scan and you should file or shred it.

3.2. You can often get manuals and pamphlets and stuff online by googling part of the text or the product name.

4. Don't scan anything you can get electronically.

4.1. Most companies would much rather let you download bills and statements and such.

4.2. Most of them will also delete those statements after a few months, so get in the habit of immediately downloading the statement.

5. It's *very* helpful to put a date on everything. I generally do YYMMDD, trying to guess from dates I find in the document.

5.1.If it's a document covering a period of time like a bill for the month of November, I use the ending date.

5.2. For tax documents I'll put TT-YYMMDD, where TT is the tax year, since the actual transactions occur that year, but filing and IRS stuff happens the year after.

6. I've found that even with full text search, you still need folders.

6.1. They just don't need to be extremely complicated; usually two levels seems to be fine. I'll put prior years into separate folders, too.

6.2. Your system will evolve as you work; just get it in there, and then be mindful of what you are commonly looking for.

6.3. Keep books and reference manuals in a folder that doesn't get indexed. (Spotlight has an option for this.) They tend to create a lot of spurious hits.

7. Keep your inbox clean, if an email wants you to download a statement, get it right away and put it in Unfiled.

7.1. Likewise, keep your desktop clean, scan and shred stuff as soon as it comes in.

7.2. Have a periodic to-do item to tidy your files, don't spend more than half an hour (tops!) at any given time.

Re:You don't need a CMS (2)

sribe (304414) | about a year ago | (#43385885)

2.3. After you've got a batch of stuff scanned, you move it into Unfiled and correct the names, or split the documents up as you need to.

god, no! Give it a sensible name and put it where it belongs to begin with; don't deal with the same document multiple times.

Re:You don't need a CMS (1)

Rinisari (521266) | about a year ago | (#43386075)

Thanks for this. This is definitely a workflow I need to model.

Re:You don't need a CMS (5, Insightful)

overlordofmu (1422163) | about a year ago | (#43386103)

Disclaimer: I know this will seem pedantic but I am trying to get people to think about problems in the long term (solutions that work for thousands of years, not hundreds).

If we use the format YYYY-MM-DD for dates (for instance 2013-04-07), they sort both alphabetically and numerically, they are easy for human eyes/minds to parse at a glance (my apologies to the vision impaired) and there won't be a reason to change to format for approximately 7,895 years (but who is counting, really).

Please see ISO 8601: http://en.wikipedia.org/wiki/ISO_8601 [wikipedia.org]

Obligiatory XKCD: http://xkcd.com/1179/ [xkcd.com]

Re:You don't need a CMS (4, Insightful)

Anonymous Coward | about a year ago | (#43386357)

4.1. Most companies would much rather let you download bills and statements and such.

And this is exactly why I HATE all of the "e-bill" solutions that every company has dreamed up at the moment.

They turn the problem from "the company remembers to SEND you a bill/invoice/paper" to "you have to go get the bill/invoice/paper FROM the company".

With paper bills/invoices/etc. sent through the US mail, they "remember" to do something, and I get an automatic reminder when the envelope appears in my mailbox.

With the e-bill solution, the most I get is an email reminding me to go log in and download the bill/invoice/paper. Now, notice what is wrong here. They just sent me a communication (hint, its the reminder email) that could have functioned identically to the USMail envelope of carrying the bill/invoice/paper along with it right to my inbox, so when I receive the email, I ALSO receive the bill/invoice/paper itself (i.e., attach the bill/invlice/paper as a .pdf to the email).

Now, most companies will balk at that because "email is not secure" or "email is not private". Well, why don't you let me F****** upload a gpg public key to your system, and then your system could encrypt my bill/invoice/paper using my gpg public key, then attach it to the "reminder" email, and now we have an electronic system that functions identically to the old paper bill in the old paper envelope sent through the postal office.

They remember it is time to send me my bill, they create the .pdf (electronic equivalent to printing the bill on paper), they encrypt the pdf (electronic equivalnet to sealing the bill in a mailing envelope, and they email me the item (electronic equivalent of giving the sealed envelope to the postal service).

But does any company implement this system? No, not one.

And so they will continue to mail me paper, and can continue to hound me to switch to "e-bills" all they like. But until their e-bills are done properly (as above) they won't get any buy in here.

Re:You don't need a CMS (0)

Anonymous Coward | about a year ago | (#43386925)

And so they will continue to mail me paper, and can continue to hound me to switch to "e-bills" all they like. But until their e-bills are done properly (as above) they won't get any buy in here.

These companies could easily allow downloading of the e-bill from within the e-mail simply by embedding a one-time-password (OTP) into the embedded URL and since the sender (company) knows your e-mail address there is no need for you to type anything to simply download the e-bill. As additional security the sender could hash the combined e-mail address and OTP using the already presumably hashed account password.

Re:You don't need a CMS (1)

reboot246 (623534) | about a year ago | (#43387015)

Amen, brother! I prefer to get my bills in the mail. Real, honest to goodness paper.

On the outside of the envelope I write the date I received the bill, how much it is, and when it is due. It goes into my mail sorter in the first bin. When I pay it online I write the confirmation number from the bank on the envelope and then put the envelope in the second bin. I keep the envelope until the next bill from that company comes in. That way I can see if the previous bill was actually paid on time or if there were any mistakes on their part or mine. If everything is okay, I shred the old bill, envelope and all. Then the cycle starts over again.

Important items like house payments or car payments are kept until they're paid off. I want a physical record I can use to prove I've made all the payments and that they were made on time. No, I don't trust banks or mortgage companies any further than I can throw them. Do you?

Scan, OCR, and use your file system (and symlinks) (2)

magic maverick (2615475) | about a year ago | (#43385631)

My suggestion would be to just scan and OCR your files, and then store them in your file system.
Hierarchy might be something like: ~/scans/year/project/sorted

Within each sorted subdir, you'd have three folders. Date, organizationThatGeneratedTheDoc and TypeOfDoc.
So in the folder ~/scans/year/project/sorted/org
The file names would be something like: organizationThatGeneratedTheDoc-yyyy-mm-dd-TypeOfDoc.pdf
In the folder ~/scans/year/project/sorted/TypeOfDoc
The file names would be like: TypeOfDoc-yyyy-mm-dd-organizationThatGeneratedTheDoc.pdf
Etc.

You'd use links (symlinks or hard links) to make sure that each document is accessible in more than one place. (You can also use links to put documents in more than one project folder.)

Types of documents would be things like invoices, receipts, legal threats, court orders etc. In the event that a document has more than one type, or more than one organization, you simply have more links. So invoice-2013-04-07-webdevteamawesome.pdf and legalthreat-2013-04-07-webdevteamawesome.pdf are the same document, because the first page is an invoice, and the second a threat to take you to court if you don't pay. (This then exists six times, three times for each type, but with the magic of hard links only takes up the space of 1.001 documents.)

With the OCRed text being saved with the PDF scan, you can also run text searches with in your files to find specific information (such as bill amount, seriously, how often would you use that information?)

This allows you maximum flexibility, and prevents you from being locked into a particular piece of software (as you can do everything manually). Moreover, once you've got it setup, it's easy to run with each new document.
Steps would be:
1) Scan and OCR doc, saving the PDF into the staging area folder.
2) Run your script, which asks for the date, project, org name, doc type.
3) The script then saves the document in the appropriate folders, generating links as required.
4) Profit!

Re:Scan, OCR, and use your file system (and symlin (1)

magic maverick (2615475) | about a year ago | (#43385677)

I should note, you need to be careful to make sure you use the same spelling and wording for each org and doc type. You don't want to end up with Murphies, Murphy's, Murphy's Inc., Murphpy's Beer Company Inc. etc., each with invoice, inv., invoise and envoice.
It would be better if your script forced you to pick a doc type, and showed a list of already existing companies.
This applies no matter what solution you end up running with.

Also, for documents that cover a period, you have multiple options. The first is to give 00 as the day and month (e.g. 2012-12-00), and the second 01 as the start (e.g. 2012-12-01). Another is to have two dates (2012-12-01-to-2013-01-01) in place of the yyyy-mm-dd suggested in my first post. Also, don't even think of having the dates in any other order than year, month, day.

Some places have a working year (e.g. a tax year) that crosses two calendar years. In that case, you should be careful about where you put documents. Because if you put them in the first year, and then go "OK, it's been 7 years, and I no longer need any docs from 2005", you'll be burnt. A solution is to hardlink them into both years.

Do post back when you have a solution!

Re:Scan, OCR, and use your file system (and symlin (1)

whoever57 (658626) | about a year ago | (#43385747)

Your suggestion is over-complicated IMHO. I use Xsane and scan as multi-page documents. Xsane allows me to add pages to the scan set and reproduce a new PDF file. There are some downsides to my method: I need to have an approximate idea of the date of the document that I am looking for.

I generally file by //.pdf, although I may vary the hierachy if appropriate, for example: TAXES//.pdf

Perhaps more important, though, is to extract the data into some form of record keeping (even if it is only a spreadheet) at the time that it is saved. Then, unless I am being audited, I really don't need the scans.

Re:Scan, OCR, and use your file system (and symlin (1)

whoever57 (658626) | about a year ago | (#43385883)

Arrgh... /. ate my filenames, even though I posted it as Plain Old Text:

I generally file by //.pdf, although I may vary the hierachy if appropriate, for example: TAXES//.pdf

Should be: I generally file by <TOPIC>/<YEAR>/<MONTH#>.pdf or perhaps <TOPIC>/<YEAR>/<MONTH#>/scans.pdf. I use other variations to the hierarchy if appropriate, for example: TAXES/<YEAR>/<Type_of_Form>.pdf. So all W2s for a particular tax year. would be in the same PDF file.

All scanned invoices for a particular year/month would be in the same PDF file and in the same directory as any downloaded invoices.

It's not important that I use the same hierarchy everywhere, I use the hierachy that will make it easiest to find the document in the future and that varies according to what I am filing.

Alfresco (0)

Anonymous Coward | about a year ago | (#43385661)

Alfresco is open source, works for Windows, Linux and Mac, and allows you to define your own metadata for your documents.

http://wiki.alfresco.com/wiki/Download_and_Install_Alfresco

SnapScan. (1)

Anonymous Coward | about a year ago | (#43385675)

http://www.amazon.com/ScanSnap-S510M-Instant-Sheet-Fed-Scanner/dp/B000WJCX18/ref=sr_1_31?s=pc&ie=UTF8&qid=1365365308&sr=1-31&keywords=archive+scanner

The above come highly recommended as an all-in-one solution.

Re:SnapScan. (1)

thomasw_lrd (1203850) | about a year ago | (#43386137)

I've used owl document repository, but it needs a webserver to run. (http://www.doxbox.ca/) It's pretty nice, and it can do full text search (sometimes).

Mayan EDMS (2)

Rob the Roadie (2950) | about a year ago | (#43385705)

I've played with this a few times, never used it in anger though.

http://www.mayan-edms.com/

I might take up your challenge on going paperless too and give Mayan a go.

Tossing hat into the ring for DJVU format. (3, Interesting)

Areyoukiddingme (1289470) | about a year ago | (#43385713)

PDF is big and bulky. DJVU format makes for tiny document scans. And there are open source libraries for creating it, available even in Debian. Wavelet compression did finally make it into the wild. It's just nobody has ever heard of it, for some reason.

Doesn't help for organization, but it should be a reasonable option for storage.

It even embeds the OCR text in the document along with the image version, so it doesn't proliferate multiple copies of the same data.

Re:Tossing hat into the ring for DJVU format. (1)

magic maverick (2615475) | about a year ago | (#43385775)

You do realize that PDF can store the OCRed text along side (or above? it's another layer) the original scanned text.

Also, because the reference software of DJVU is GPLed, it's never going to see widespread commercial use (as all the big software companies only want to take and take).

A standard subset of PDF (e.g. PDF/A) is a much better option, and if you're worried about the amount of file space taken up, you can always use GZIP or ZIP.

Re:Tossing hat into the ring for DJVU format. (1)

spire3661 (1038968) | about a year ago | (#43385803)

And as counterpoint; I bought a 32 GB, class 10 MicroSD card with adapter for $22.19 USD yesterday.

Neat? (0)

Anonymous Coward | about a year ago | (#43385749)

So, you were watching TV and you saw the commercial for the Neat Scanner [neat.com] . You thought to yourself' "that's a great idea. I wish I had that, but I run Linux. I wish there was something like this for Linux. Certainly someone has come up with such software for Linux."

I know how you feel. That's exactly what happened to me over 5 years ago. Yet today, we still have this lame list of "excuses" for solutions. PDFs in rigid directory structures, blah, blah, blah.

Sadly there is still nothing like the Neat scanner system for Linux. Something that, preferably, OCRs and indexes your documents for easy searching and retrieval. At the least something that indexes, even if you have to manually populate the fields. Nothing at all after years of hoping.

Cue the replies stating it would be trivial to make your own using MySQL/Mariadb and a PHP frontend. It always amuses me that it's so trivial, yet no one has done it yet, except on Windows.

Re:Neat? (1)

evilviper (135110) | about a year ago | (#43385909)

Sadly there is still nothing like the Neat scanner system for Linux. Something that, preferably, OCRs and indexes your documents for easy searching and retrieval. At the least something that indexes, even if you have to manually populate the fields. Nothing at all after years of hoping.

There are NUMEROUS document/content management systems for Linux (and have been for years), any of which will do VASTLY more than the dumbed-down "Neat" system.

Re:Neat? (0)

Anonymous Coward | about a year ago | (#43386151)

Sadly there is still nothing like the Neat scanner system for Linux. Something that, preferably, OCRs and indexes your documents for easy searching and retrieval. At the least something that indexes, even if you have to manually populate the fields. Nothing at all after years of hoping.

There are NUMEROUS document/content management systems for Linux (and have been for years), any of which will do VASTLY more than the dumbed-down "Neat" system.

Yet you failed to mention a single one. Naming just one would have answered the Ask Slashdot question and made the GP look like an idiot. Instead, you look like an assclown.

Re:Neat? (0)

Anonymous Coward | about a year ago | (#43386929)

I'd appreciate it if you could name a few. I could use a god one.

And please note that I didn't need to be rude to you.

-- hendrik

Hosting on a NAS (1)

ericdano (113424) | about a year ago | (#43385763)

I have been trying to do this for a while. I have a ScanSnap S1500M and have been hosting all the PDFs on my Synology NAS. However, programs like iDocument don't support network drives and text searching PDFs. They rely on Spotlight's database, and spotlight doesn't work on a NAS (though it supposedly does work on a Apple Server).

I'd LOVE some sort of text searchable solution that is better. I do use iDocument, but that has a LOT of limitations, like it will not handle ePUBs. I'm hoping at some point Synology will create an App for it's line of units similar to something like Evernote. They already have two great Apps that allow you to stream Audio and Video from your Synology unit to an iOS or Android phone and computer. And they also have a Dropbox like App. The last piece they really need is some sort of document management thing that works with their stuff. That would be a perfect solution for someone who has a lot of documents or a small business which doesn't want to have it's data in the hands of Google or other companies.

easy peasy (0)

Anonymous Coward | about a year ago | (#43385765)

boxcryptor + your choice of google drive, dropbox, etc. Keep a notes.txt file and put each set of scans in a dated folder. Keep a local copy and use copernic etc to make it easily searchable. done and done.

Owncloud? (2)

bazorg (911295) | about a year ago | (#43385817)

Maybe that Owncloud [owncloud.org] thing will work well to handle the storage and access. Anyone knows if its search function is any good?

Alfresco (3, Informative)

Balr0g (960255) | about a year ago | (#43385829)

I use the community edition of Alfresco [alfresco.com] for that task. You can tag all documents, add custom fields and have full text search and versioning out of the box. Documents can be accessed via web interface, smb, ftp and even imap.

Alfresco? (0)

Anonymous Coward | about a year ago | (#43386247)

I use the community edition of Alfresco [alfresco.com] for that task.
You can tag all documents, add custom fields and have full text search and versioning out of the box. Documents can be accessed via web interface, smb, ftp and even imap.

Alfresco is great and all, but it seems a bit heavy for a home use scenario and it doesn't handle or automate the scanning tagging aspect of things either. He's looking for a lot more than a DMS.

Alfresco CMS might be something to check on (0)

Anonymous Coward | about a year ago | (#43385857)

I have been setting up the Alfresco CMS at work for our companies document management and its pretty robust. The install process on Linux is a bit of a bitch, because you have to get it just right. I am not sure about the other OSes but I am assuming it doesn't require too much on the Windows and OSX side of things. http://www.alfresco.com/node/2296?utm_expid=11184972-4

I wrote a script to do exactly what you are saying (1)

Ogi_UnixNut (916982) | about a year ago | (#43385867)

My situation is the same, except that I move often, and have to keep legal documents for a few years (typically 5). I also have paper copies of invoinces and Bills (loads). I didn't want to have to lug boxes and boxes of paper, so I developed a script to do the following:

1) Scan the document page by page, and save as tiff (300dpi)
2) Run open source OCR on it, and save the resulting text to the tiff "comment" field on the metadata
3) Save it in my file server.
4) Index it with a desktop search program (here is a list: http://en.wikipedia.org/wiki/Desktop_search [wikipedia.org] ). This has the nice facility of scanning the metadata and allowing you to search it. This way I can search documents by text, ignoring the fact OCR is not 100% correct (it is usually correct enough for me to find the document I want), while having the pure text in photocopy quality as a TIFF (this is very important for legal documents, as OCR'd versions are not acceptable replacments).

I have been wondering whether it would be worth open sourcing the script (for the moment it is a bit hacky, but it has been serving me well for years now). If the TIFFs take up to much space for you liking, subsitute with PNG/JPEG/etc...

So far it has served me well, I've been collecting hundreds of documents this way. The only manual step is the script requesting a filename (not a big deal for me, as I have to manually put each page into the scanner anyway).

If you are interested let me know, and I can post the script.

Re:I wrote a script to do exactly what you are say (1)

Rinisari (521266) | about a year ago | (#43386115)

Please do post the script. Throw it up on pastebin, or, better yet, https://gist.github.com./ [gist.github.com]

Re:I wrote a script to do exactly what you are say (1)

beachdog (690633) | about a year ago | (#43387055)

Here are some pieces of a scan to ocr script I am developing.
First I am scanning a multicolumn document and to preserve the sense of the document text, I scan even pages twice and odd pages twice.
Second, the scanned images must be rotated. Pieces of the "convert" command appear in the perl fragments here.
Third, I am using the open source tesseract OCR program. Some of my documents have grayed areas that contain text. So I am running tesseract twice on the source files and picking the output file with the most text characters.
Forth, the basic program is just a big loop with a menu where I input file names or page numbers.

Here goes:
# my $scanprog = "/usr/bin/scanimage --resolution 400 >";# print "$scanprog \n";
# Scanner settings for pages top of book at left of scanner StylusScan 2500
my $scanoddleft = "/usr/bin/scanimage -l 30mm -x 190mm -y 235mm --resolution 400 >";#for odd pages
my $scanoddright = "/usr/bin/scanimage -l 0mm -x 190mm -y 235mm --resolution 400 >";#for odd pages
my $scanevenleft = "/usr/bin/scanimage -l 30mm -x 190mm -y 235mm --resolution 400 >";#for even pages
my $scanevenright = "/usr/bin/scanimage -l 0mm -x 190mm -y 235mm --resolution 400 >";#for even pages
# OCR commands and parameters
#tesseract test1.tif test1 -l eng;
#scanimage -l 26mm -x 166mm -t 10mm -y 125mm --brightness 3 --resolution 400 | pnmtotiff>test1.tif;eog test1.tif;convert -rotate 90 test1.tif test1.tif; eog test1.tif; tesseract test1.tif test1 -l eng
my $tesseract = " tesseract ";
my $language = " -l eng ";
my $brightness2 = " --brightness 2 ";
my $brightness3 = " --brightness 3 ";
my $convert90 = " convert -rotate 90 ";
my $eog = " eog " ;
my $charcount = " wc -c " ;
my $scanpage = 1; # Range is 1 to 183

Mac OS X Finder (1)

supercrisp (936036) | about a year ago | (#43385891)

Thinking about this question, I checked the folder in which I keep research and notes for my primary area of study. It's 2GB and just under 2,000 separate files. Many of these are OCRed PDFs, some mp3, some .doc, .rtf. Mac OS X's indexing lets me do adequately quick find-by-content searches, and a relatively simple organizational schema for subfolders let me consult categories of data swiftly. I also use a reference manager program that probably has close to a 100 keyword tags, and Finder lets me get to stuff as quickly, so I'm assuming creating some sort of metadata beyond filename, date, and filetype is really unnecessary. I'd say just relax and throw the stuff in a folder in Finder, and back that up somewhere while also using something like SpiderOak. My work requires frequent and specific searches over this fairly large data set, so if this system works for me, it would probably work for you, unless you plan on getting OCD with your OCR and scanning every Wally World receipt. Anyway, my advice is to keep it simple. Life is too short to diddle around with stuff like this.

A couple questions (1)

The Optimizer (14168) | about a year ago | (#43385921)

Just a couple questions come to mind:

First: What is the purpose of keeping the information? If it's just to have a record for your own sake of what and when and how much, do you even need to scan the statement or receipt or keep the original? or can having all the info imported into a money manager be enough?

I've been using Quicken for over a decade (still using Quicken 2000 actually as later versions are bloaty) to keep all my financial history in detail. For answering questions like "When did I buy that Belkin KVM switch so I can see if the warranty period has expired" searching the register is good enough as I add enough info the memos. In this example (real one from just a week ago), finding the information easily was enough, and it's to my advantage to have all the individual statements and detail items combined into larger account histories rather than parse an archive tree full of pdf/ocr files (FWIW: even this old version of quicken lets me attach scans of receipts to entries)

Second Question: In what cases is the Original Paper required as opposed to a scan? If you need to show an original statement, receipt or other document to prove some thing or get something approved, do you know when an electronic copy or reproduction is as acceptable as the original? I don't think this is an area with consistent clear cut answers yet because of its newness.

Let's take an admittedly unlikely example. You have a house but have moved to take a job out of state, and you're trying to sell the house. Some scumbag squatter moves in and tries submitting false documents to claim ownership. All the documents relating to purchase and any mortgages have been scanned and shredded. Will the courts, police, banks, city and county offices etc. give you any trouble because they are not signed originals? What if the scumbag claims you fabricated the documents (like he did) and his are the originals? What if some entities accept a scan and others don't?

I've implemented a hybrid system where different documents get scanned / destroyed at different times. I have a single card-file cabinet (Filing cabinet with half-height drawers). Paper copies of everything from the current year and previous year are kept in a drawer. At the end of each year, I take all the documents from year-1, shred most of them (assuming any need for them has past), and put the ones I deem most critical in a small box to archive.

Re:A couple questions (1)

Rinisari (521266) | about a year ago | (#43386177)

The initial purpose of keeping the information is completion. I sheepishly admit to digital hoarding, and this may be feeding that desire. To me, it's easier to scan a document and tag it, rather than importing its information.

I need to keep things like receipts for large purchases for insurance, expense, and warranty purposes, bills and account statements, tax documents, and even things like the rare paper letter I get (e.g. my former tax preparer died last year. If I were to be audited, I'd need some evidence that she's dead. I have a letter from her next of kin and coworkers saying that she died.)

I need original paper for SOME receipts, things with raised seals such as birth certificates or car titles, and other unique items that the originality of the paper would increase its authenticity in a court of law.

What you do seems very similar to what I want to do, perhaps with the exception that I'm a metadata nut and want to be able to search things a little easier, should the need ever arise.

Smartphone (0)

Anonymous Coward | about a year ago | (#43385951)

Forget scanners. Use smartphone (iPhone in your case, I suspect) to scan, ocr and convert to pdf.
Use some cardbox with sides cut open as smartphone support.

Then transfer files to your NAS server and organize it way that makes sense to you.

Any phone camera 5Mpixels+ is ok for this kind of job.

Re:Smartphone (1)

Rinisari (521266) | about a year ago | (#43386129)

Using Camscanner [google.com] or its ilk is something that a few friends have suggested, but I find the quality of the scans to be less than I really want for long-term archival. This may suffice for many documents that I'm likely never to look at again, such as bills, but things like letters or tax documents I think may require a little higher quality. Also, if a document is more than one page, camera scanning quickly gets unwieldy. I scanned a 30 page document on the go using Camscanner and it was a painful experience.

gscan2pdf (1)

markdavis (642305) | about a year ago | (#43386003)

I use gscan2pdf http://gscan2pdf.sourceforge.net/ [sourceforge.net] with my multifunction "printer" and then save the bills and documents in properly named and organized directories as pdf files. Simple as pie. (Why is pie simple?)

DevonThink (0)

Anonymous Coward | about a year ago | (#43386021)

If your target OS is OS X, you may want to drop the "open source" part of your requirement list and take a good long look at DevonThink Office Pro. When used in conjunction with the Fujitsu Scansnap (though one isn't required per-se) it's a seriously beautiful thing. You can set up a blindingly fast workflow in just minutes.

The hierarchy it creates is easily exportable as nested directories of bagged-and-tagged PDFs, should you ever need to jump ship to another system.

Dropbox or google drive (1)

alen (225700) | about a year ago | (#43386041)

I use my iPhone to scan, convert to,PDF and upload to my Dropbox. The app cost me $6

Dropbox will always be there and is backed up

Depends on your personality. (1)

Richy_T (111409) | about a year ago | (#43386049)

I used to periodically sort things into hanging folders then dispose of anything nonessential after 3-4 years. A few years back I decided to switch to scanning. So I started collecting a pile of stuff to scan. In the intervening years, that pile has grown and grown and now the scanning would be such a big chore that I don't even like to contemplate it anymore.

The simple fact is that most documents are not something you will ever need again so deserve the minimal effort you can put towards temporary mid-term storage and worth 0 effort for archiving. Others may disagree but I suspect I already keep too much for too long. To be honest, there's not really been much that would have been an issue if I didn't shred immediately after reading.

Paperwork and MALODOS (0)

Anonymous Coward | about a year ago | (#43386063)

I have never used them but reading their description, they seems to fit your needs: https://github.com/jflesch/paperwork and http://code.google.com/p/malodos/

Zotero (1)

Fpdx (2689069) | about a year ago | (#43386079)

Zotero (an extension of firefox, also stand alone I believe) works well for me to archive lots of PDFs. It has tags and directories, meta information, search, notes etc.. Once you got your pdfs Zotero is a good organizer.

I think that's a very bad idea. (0)

Anonymous Coward | about a year ago | (#43386139)

"only retain the papers for things that don't require the original copies"
That's a disaster waiting to happen; I think it would be a lot safer to retain the papers that do require originals instead.

Open Source is great ... until (0)

Anonymous Coward | about a year ago | (#43386145)

Until..you need a niche product like you are describing. Then spend the $200 bucks for a dedicated solution that actually saves time and requires no modification.
Buy NeatReceipts and be done with it.
Onsite and offsite backups, labeling, searching, categorizing and OCR.
The amount of time I save with a dedicated solution is worth the up front cost.

just an anonymous cowards 2 cents

Elyse (0)

Anonymous Coward | about a year ago | (#43386149)

I never got a chance to play with this, but the features sound nice - and they claim to work with Apple. http://silkwoodsoftware.com/ [silkwoodsoftware.com]

I use Growly Notes for archival (0)

Anonymous Coward | about a year ago | (#43386441)

I use Growly Notes which is a note-taking application that's freeware on Mac OS X. It isn't Open Source - the developer is an ex-Microsoft Word developer that has written a number of free Mac OS X applications. It is essentially a Microsoft One-Note knockoff that runs on Mac OS X.

Documents (photos, PDFs, video clips, text, audio clips, etc) are organized by Notebook at the highest level and you can have many notebooks with various topics. These go across the top of the page as tabs. There's a left sidebar with a section and then notes. Notes go under sections. So in my bills notebook, I have sections with Verizon, Electric, Gas, Visa, etc. bills. In the notes, I put in a year for the bills and I just drag photos of bills or PDFs into the note. This system works quite well for me. It's local and I back it up to an off-site drive but nothing on the network.

The downsides of doing this is that the project is dependent on the one developer and on Mac OS X which some may not like. It is relatively easy to export notebooks, sections or notes to PDF format so I could get everything out at some point in the future in a convenient format but it would take a little work to do that. My alternative is to use Microsoft One-Note which is something that I'd have to pay for (I use Growly Notes for my work projects too which would mean business licenses for One-Note on multiple machines would be needed along with the prices for updates). That's my backup plan of Growly Notes goes south in the the future.

I haven't been able to find anything in the Open Source world comparable to Growly Notes and One Note. I've seen many projects started but nothing that I would consider Commercial Quality. Evernote does this sort of thing but it stores your data in the cloud. They had a security break recently which confirms that I don't want to put business work product or confidential information on the cloud.

Skip scanning, download PDFs directly. (2)

yeah I can fix that (2890591) | about a year ago | (#43386765)

(Longtime listener; first-time caller. If I'm doing it wrong, please be kind.)

I've been going through the same issue and have painstakingly scanned/filed a metric crapton of old documents, putting them in a hierarchical directory structure where I can find them if I need them.

But this sucks for a number of obvious reasons. The ones that bother me the most are:

1) A scanned document is larger (*and* less useful than a downloaded PDF).

2) It's a manual process! I'd rather spend ten hours automating something than five hours over the next 5 years trying to remember the filename convention for storing the scanned document.

Anyway, cutting to the chase, I'm now using Ruby/Watir scripts to automate the business of downloading my most common phone/utility bills from the websites and stashing them directly. I used to use Perl and WWW::Mechanize but all the websites are now so contaminated with unecessary javascript that only something which manipulated a browser directly allows automation without pulling my hair out. Ruby/Waitr works pretty well. Recommend. Automated download; priceless. Without automated download, I'd rather return to scanning paper documents mailed to me, otherwise you quickly find how unreliable your service provider is for retaining your statements.

If anybody wants some pre-alpha scripts for grabbing their pg&e, comcast, cigna, at&t, schwab, nvenergy statements, let me know.

Re:Skip scanning, download PDFs directly. (0)

Anonymous Coward | about a year ago | (#43386899)

If you're curious... Selenium (http://docs.seleniumhq.org/) is pretty good at automating all web browsing and exposing that data + functionality to a few different languages. Python being the one I use. Give it a quick go with one of your login/javascript sites.

Super simple Linux based document scanner. (1)

beachdog (690633) | about a year ago | (#43386905)

Super simple scanning system using Linux.
Make directory called scans, make another called taxes
Have a text file of scanning hints with an easy to remember name.
in a terminal, print the scanning hints file and use the Linux mouse copy feature to construct a scan instruction
The scanimage application requires sudo or you can find a tweak using google search to alter the scanner's USB files and make it run from an unprivileged user.
cd scans
cat filewitheasycommandstocopy.txt

Typical contents of my hint file:
sudo scanimage -l 0mm -x 90mm -y 66mm --resolution 400 | pnmtojpeg >cprcard.jpg
# make files non-overwritable
# chmod -w ~/scans/*.jpg

Verify each scan with eog viewer.
Organize scans like this:
Make long filenames with agencynames, recipientnames, and documentnames all in lower case.
use the mouse to copy an old file name for re-use.
      this groups similar documents together.
use ls -lr to show most recently scanned items.
use ls -lr *keyword*.jpg to show selected classes of scanned items.
use locate in the distant future to find those oddball items like certificates or letters of recommendation.

locate certificate | grep rabies

DEVONThink Pro Office (0)

Anonymous Coward | about a year ago | (#43386933)

The Pro version because it does OCR, and a few other things, including archive and index your email. Plus, it interfaces VERY well with Fujitsu SCANsnap scanners.

I've been using it a few years, and it works a treat.

Evernote? (0)

Anonymous Coward | about a year ago | (#43387121)

Isn't this just about the classic use case for Evernote? About the only criterion it doesn't hit is saving locally to a NAS (although I admit that might be an important one for this specific user).

Evernote's secure enough for most purposes; it does a particularly good job of being able to search text that's been scanned; it operates on just about any OS out there; with certain makes/models of scanners, you can scan direct to your Evernote account; if you're caught short, you can "scan" from your phone or tablet and send to Evernote; you can tag your stuff; you can reprint your stuff; setup has to be easier than any homebuilt system; you can access your stuff from just about anywhere; cost is free to minimal; lots of people are using it so it's presumably pretty robust and reliable (I've had no problems in this respect)...

No, I'm not an Evernote employee/shareholder/fanboi, just a very satisfied user... OK, maybe I'm a fanboi

Load More Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Create a Slashdot Account

Loading...