Best Way To Archive Emails For Later Searching?

timothy posted more than 3 years ago | from the shrink-wrap-them-and-use-sandwich-bags dept.

Communications 385

An anonymous reader writes "I have kept every email I have ever sent or received since 1990, with the exception of junk mail (though I kept a lot of that as well). I have migrated my emails faithfully from Unix mail, to Eudora, to Outlook, to Thunderbird and Entourage, though I have left much of the older stuff in Outlook PST files. To make my life easier I would now like to merge all the emails back into a single searchable archive — just because I can. But there are a few problems: a) Moving them between email systems is SLOW; while the data is only a few GB, it is hundred of thousands of emails and all of the email systems I have tried take forever to process the data. b) Some email systems (i.e. Outlook) become very sluggish when their database goes over a certain size. c) I don't want to leave them in a proprietary database, as within a few years the format becomes unsupported by the current generation of the software. d) I would like to be able to search the full text, keep the attachments, view HTML emails correctly and follow email chains. e) Because I use multiple operating systems, I would prefer platform independence. f) Since I hope to maintain and add emails for the foreseeable future, I would like to use some form of open standard. So, what would you recommend?"

It's obvious (3, Funny)

Mikkeles (698461) | more than 3 years ago | (#33488876)


IMAP (5, Informative)

klingens (147173) | more than 3 years ago | (#33488884)

An IMAP server (dovecot, cyrus, courier) of your choice for Linux. If you don't have a Linux server you can always run it inside a small VM.

Re:IMAP (1, Informative)

hedwards (940851) | more than 3 years ago | (#33488964)

Yeah, IMAP is the way to go, personally, I use IMAP on my email account and mailstorehome [] to do the actual download and backup. The OP will probably end up having to set up a personal server to get the program to download the older mail, but that can be done easily enough via a virtual machine.


Kmail for Outlook stuff and Search. (4, Informative)

twitter (104583) | more than 3 years ago | (#33489252)

Kmail has an excellent .pst converter that will pull out your old Outlook mail. Once you have it in Kmail, you can drag and drop it into any of the supported formats, mbox, mdir etc. If you have already established filters, you can let them sort things out. If not you can use a manual search for to, from, mail list, subject, etc. From there you can run your imap. I carry everything around on my laptop and use kmail instead of using imap. With full drive encryption and xscreensaver, I don't have any worry about losing private information and know that my ISPs have better collections of my email anyway, despite what they say about size limits. I could use Gmail's imap instead of my own but prefer to suck my gmail out with kmail's imap support. Until US networks get more reasonable, I want my mail with me instead of on my own server and I would not advise anyone to leave their mail on someone else's server without having a copy yourself. Because your question is all about search, I have to plug Kmail again. With proper organization of your mail into subfolders for friends, family, lists, companies and projects, mail searches are quick, even on modest hardware like my ancient PIII laptop. Searching everything takes a little longer, but it is not such a burden. Evolution may do as well but something about Gnome turns me off. The only downside is that the 3.5 branch does not seem to be able to search through encrypted mail but I imagine there's some gpg-agent fix for that I'm not aware of.

MySQL (1)

perpenso (1613749) | more than 3 years ago | (#33489318)

Perhaps the best route would be to use MySQL or some other FOSS database and build a web front end for browsing, searching, etc

Re:IMAP (1)

arivanov (12034) | more than 3 years ago | (#33489322)

The guy mentioned entourage. If he is running MacOS he can run any of these on MacOS.

This solves the "storage" problem. However, this does not solve the search/index/etc problem. I have 9G+ and growing IMAP store going back to 1999 with several hundred of folders in it so I am facing a similar problem. Using Thunderbird search and even grepping it on the server just does not cut it any more.

Re:IMAP (2, Informative)

wealthychef (584778) | more than 3 years ago | (#33489360)

Well, on OS X the searching problem *should* be solved by Spotlight, as it indexes "all files on your hard drive" (not) into constant-time searches automagically. The trouble with Spotlight is that Apple does not search all folders and I do not know of a way to enable it to search all folders. If you import it into, you do get the indexed behavior, and my situation is similar to yours, and I do exactly that. But all those billions of old messages, I keep in an archive that I never look at.
Anyhow, look into Spotlight on OS X. Ooops, you said "open sourced," right? Damn. I don't know, then.

Re:Psychiatric consultation! (5, Funny)

balaband (1286038) | more than 3 years ago | (#33488936)

This is slashdot. We save computers older than your dad just to use them as alarm clocks. Please leave.

Re:Psychiatric consultation! (1)

sakdoctor (1087155) | more than 3 years ago | (#33489030)

I would use a computer older than your dad just to use as an alarm clock, but I just can't help upgrading.

Re:Psychiatric consultation! (4, Interesting)

Cylix (55374) | more than 3 years ago | (#33489032)

I never thought of turning an ancient host into an alarm clock.

Once however, I did hollow out an SGI case and turn it into a refrigerator.

The case was just too damned pretty to throw away.

Re:Psychiatric consultation! (1)

AnonymousClown (1788472) | more than 3 years ago | (#33489242)

Red or blue one?

Re:Psychiatric consultation! (4, Insightful)

pz (113803) | more than 3 years ago | (#33489024)

How was this modded Informative? Saving correspondence for future reference is critically important. I have many times needed to refer back to messages that are years old, in order to pull up a vital bit of information that was suddenly relevant. I have needed to pull up an attachment from an email a few months old old, or view the exact wording of correspondence, check the date of a quotation, etc., more times than I can count, so searching and retrieval are both vitally important. When I run events, I need to be able to post-hoc review all of the correspondence for demographic analysis, often done two years after the event when the final reports are being written. Saying that this sort of behavior is odd, or not normal is either being a troll, or not understanding how the world works when you're not just a drone.

IMO, this is one of the best Slashdot questions ever, and I am greatly anticipating hearing some good answers, especially if they don't include suggesting GMail as a panacea, as I want to have the email text and attachments in my possession.

Re:Psychiatric consultation! (1)

DigiShaman (671371) | more than 3 years ago | (#33489092)

Na, he's probably a lawyer.

That's right, I'm looking at you Mr. "I've got a 22GB mailbox on the new Exchange 2007 system". Quotas, learn em, love em, use em!

Re:Psychiatric consultation! (2, Interesting)

garcia (6573) | more than 3 years ago | (#33489142)

Starting with GMail I have kept every e-mail since 6/22/2004. I also brought over many e-mails I had in my saved folders from long before that. Am I insane? No. I have found this archive incredibly useful for any variety of uses even 6 years later.

Nothing like having your wife ask, "man, I wish we still had the recipe for deviled eggs we made in college. Too bad it was back in 2001." "No problem honey, hold."

Date: Fri, 26 Jan 2001 13:40:46 -0500
From: yoyoskippy
To: (now dead, have at it spammers)
Subject: Deviled eggs

Deviled Eggs

6 hard cooked eggs
    (throw two more eggs in, so you can check how they are doing)

pinch of salt (thats a pinch boy, wayyyyy less than 1/4 tsp.)

1/4 tsp. pepper
1/2 tsp. dry mustard
2 Tbsp. Hellmans
1 Tbsp. Miracle Whip
Paprika (sprinkles)

Boil the eggs, use the extra two eggs to check the eggs process. when boiled crack the shell a bit with a spoon. then put the eggs in cold water w/ice cubes. this makes it easier to peel the shell off the egg. Next take the yolks out of the eggs and smash up very finely with fork. next add all of the ingredients together to make the topping. mix well. spoon the mixture onto the egg and then sprinkle on paprika. enjoy. yum yum!!

Pulled that out a couple weeks ago for a picnic. Yum yum!! was right.

Delete (2, Insightful)

Anonymous Coward | more than 3 years ago | (#33488908)

Time to delete them all

A Lawyer's Fantasy ... (4, Insightful)

perpenso (1613749) | more than 3 years ago | (#33488916)

I have kept every every email I have ever sent or received since 1990 with the exception of junk mail (though I kept a lot of that as well) ...

You are a hostile lawyer's fantasy come true. ;-)

Google Mail. (3, Insightful) (1238654) | more than 3 years ago | (#33488934)

See subject.

Re:Google Mail. (1)

WiglyWorm (1139035) | more than 3 years ago | (#33489090)

This was my first thought as well. 0 reason for anything else.

Re:Google Mail. (1, Insightful)

Anonymous Coward | more than 3 years ago | (#33489320)

Uh, privacy would be the reason.

Making things easier. (1, Informative)

Anonymous Coward | more than 3 years ago | (#33489138)

To help spare you the precious keystrokes it would take to Google this yourself, you can go straight to “Google Apps for Businesses [] ” and sign-up. Now did you really have to Ask Slashdot?

one word (1, Interesting)

Anonymous Coward | more than 3 years ago | (#33488944)


goodluckwiththat (-1)

Anonymous Coward | more than 3 years ago | (#33488948)


Not Much (2, Informative)

maxume (22995) | more than 3 years ago | (#33488950)

It isn't particularly platform independent (because no one is paying much attention to Windows), but Not Much offers threads and full text search: []

Re:Not Much (3, Informative)

koiransuklaa (1502579) | more than 3 years ago | (#33489296)


Notmuch can manage absolutely insane amounts of email without any artificial 'archiving'. Of course, if you are looking for a a program that does something else than tagging and searching (like sending, composing or receiving email), you need to look elsewhere.

Print (4, Funny)

JustOK (667959) | more than 3 years ago | (#33488952)

Print then scan

Re:Print (1)

zeil (1690682) | more than 3 years ago | (#33489122)

Ouch.. really... if your going to do that at least print to PDF

Re:Print (0)

Anonymous Coward | more than 3 years ago | (#33489140)

That would obviously be very fast for text, but what about binary attachments?

Gmail? (5, Informative)

spiffydudex (1458363) | more than 3 years ago | (#33488956)

While not open source, Gmail has a good search engine that isn't sluggish. Plus it has roughly 7.5 gigs of space to store data. Use IMAP to push all of your emails to the server and then use that Gmail account for archive email only.

Re:Gmail? (2, Insightful)

siliconbits (943161) | more than 3 years ago | (#33488992)

I second that. Invest in Google Apps to benefit from additional services as well.

Re:Gmail? (1)

Threni (635302) | more than 3 years ago | (#33489036)

Thirded. And if you're bothered about non-oss/future proofing etc, just download it all every now and again via the POP3 interface and burn it/keep it locally, accessible via Thunderbird (for example).

Re:Gmail? (3, Insightful)

pvera (250260) | more than 3 years ago | (#33489058)

Yes! The thing that appeals to me the most about using Gmail is that searching through 5+GB of old emails won't make everything in my machine slow to a crawl. Even with the free Gmail account, you can up the storage to 20GB for $5/year, and that extra space is available from other Google services connected to the same account.

If you want to have more flexibility, sign up for a Backupify account, which can backup Gmail pretty well. As a bonus, when Backupify stores your backups they are kept in plain text format, so you can always pull these and move them elsewhere without having to worry about issues with Gmail's storage formats.

An Advertiser's Fantasy ... (5, Interesting)

perpenso (1613749) | more than 3 years ago | (#33489056)

And now the poster becomes an advertiser's dream come true in addition to being a hostile lawyer's dream come true. ;-)

Remember that from Google's perspective gmail is a tool to better profile you for targeted advertising. Make sure you are OK with that before giving them access to all your emails.

Re:An Advertiser's Fantasy ... (2, Interesting)

Nemilar (173603) | more than 3 years ago | (#33489116)

OK, so I hear this a lot and I never really understand the problem.

The "unwritten gmail contract" (and it actually applies to most Google products) is this: We will give you a service for free (in this case Gmail), and in return we are going to profile your use of that service to select ads for you. In the case of gmail, they give you however many GB of storage, always-on cloud email, and the best searchable email system I've ever seen. There are other Google examples, from gtalk to Google Docs. The basic principle behind it is the same, most people understand the deal, and I don't see anything wrong with it. There's no such thing as a free lunch, but this is pretty close.

Re:An Advertiser's Fantasy ... (1)

camperdave (969942) | more than 3 years ago | (#33489198)

By storing personal data on gmail, you are one hack away from identity theft. I prefer to keep as few personal details on the net as possible.

Re:An Advertiser's Fantasy ... (1)

perpenso (1613749) | more than 3 years ago | (#33489260)

There is nothing unwritten about it. Google is quite up front in their agreement that they data mine your emails for targeted advertising purposes. I agree that there is nothing wrong with this, but I disagree that most people are aware of this.

Re:An Advertiser's Fantasy ... (1)

wvmarle (1070040) | more than 3 years ago | (#33489338)

Until they start selling that information about you to third parties. Google having a profile about me that's used in house to target ads to me, is OKish acceptable. Them selling this info to third parties is a definite no-go. And there is nothing that I am aware of preventing them doing just that, other than their own ethics.

Re:Gmail? (0)

Anonymous Coward | more than 3 years ago | (#33489214)

Except that gmail doesn't index the full text of every e-mail - so searching old mail is a hit-and-miss affair.

Re:Gmail? (0)

Anonymous Coward | more than 3 years ago | (#33489348)

I totally agree.

However... I've had problems with the transfer of my old mails: I used Thunderbird to 'mount' GMail via IMAP, and then just copied all my old mails over. As this took about 2-3h, I didn't babysit the process. Much later I found out that part of mails is missing, because Thunderbird had apparently run on an error of sorts and stopped copying the data; with no error message I would have noticed. No problem, as I can just re-copy the old data. But... Does anyone have advice for an idiot-proof Thunderbird-to-GMail copy process?

OK, My Favorite (2, Interesting)

BoRegardless (721219) | more than 3 years ago | (#33488962)

MailSteward on the Mac.

SQL database. Good, Inexpensive, works w/many tens of thousands of emails & more. []

Re:OK, My Favorite (1)

BoRegardless (721219) | more than 3 years ago | (#33489044)

Forgot to note a key factor and that is ultimately format independence, since email clients come and go over time & then many key output formats, so you are not restricted on that avenue.

The search function is certainly a key for me, as sometimes I know only one key word in attempting to find a note about material, object or company from 15 years back.

Mbox or SQLite (2, Insightful)

Anonymous Coward | more than 3 years ago | (#33488968)

If you want an "email format" why not mbox? Many things currently support that as an import option.

If you want a database, why not SQLite? It's about as open as can be, backwards compatibility is almost a religion and should have no problem with hundreds of thousands of entries.

Use gmail. (0, Redundant)

el_jake (22335) | more than 3 years ago | (#33488970)

Migrate all to gmail With gmail you got room for your couple of GB. And the search feature works like a charm. Only thing missing is "folders" to make it act like you are used to.

Re:Use gmail. (1)

pz (113803) | more than 3 years ago | (#33489054)

Migrate all to gmail With gmail you got room for your couple of GB. And the search feature works like a charm. Only thing missing is "folders" to make it act like you are used to.

Although the searching features in GMail are great, I find the interface with a single unified sequence of mail, and lack of folders (the tagging feature is far too clunky) to be a major impediment. The biggest issue though, is that I do not own a copy of the information on my own server.

Re:Use gmail. (1)

mspohr (589790) | more than 3 years ago | (#33489158)

Gmail does not have folders but it does have tags. Tags can be used like folders but are more flexible since you can have more than one tag on a message. However, I have found that gmail's searching is so good that I don't even need to use the tags. Everything just goes into the "Archive" and the gmail search always finds what I want... quickly and easily.

mbox + grep (5, Funny)

Anonymous Coward | more than 3 years ago | (#33488972)

I use mbox format [] files and grep [] .

IMO, one can't get much more portable than that.

Use Gmail (0)

Anonymous Coward | more than 3 years ago | (#33488974)

Gmail is the only mail service I know of that was designed from the ground up for easy searching and tagging (with Labels) your mail.

Use MySQL or some other FOSS DB (0)

perpenso (1613749) | more than 3 years ago | (#33488980)

Abandon trying to do this with an email client app's archive. It is doubtful they are designed or tested with this amount of data in mind. Maybe you could set up your own email server with a web front end. Or perhaps the best route would be to use MySQL or some other database and build a web front end for browsing, searching, etc.

Entourage? (0)

Anonymous Coward | more than 3 years ago | (#33488982)

Why do you hate yourself?

Courier-imap (1)

mpol (719243) | more than 3 years ago | (#33488988)

I can advise a Linux server with Courier-imap. It's easy to centrally store your mail, and as long as it's on the internet you can reach it. Even from work, with friends, or on vacation.
It's not really fast in my experience, but not terribly slow.
And you can save things in Maildir format, which is universally supported. And it's easy to backup with some scripts.

Re:Courier-imap (1)

Richy_T (111409) | more than 3 years ago | (#33489076)

Also easy to search and sort. And I don't mean with a mail client's search features but truly powerful tools like find, grep and xargs.

+1 for IMAP/Maildir.

Gmail (0, Redundant)

Quick Reply (688867) | more than 3 years ago | (#33488990)

Use Gmail like a normal person, not your requirements but close enough [insert solution for offline Gmail backup here because Google are proprietary and evil]

Re:Gmail (1)

Sancho (17056) | more than 3 years ago | (#33489358)

Lots of people have been suggesting gmail, and that's great for some. There are some significant limitations/constraints, though.

1) I use the common "business" trick to help identify who is selling my e-mail address. Gmail has plus-addressing, which works reasonably well, however it is imperfect. Some spammers know about plus-addressing, and strip the plus.
        Google Apps for Domains would work, except that you're pretty limited in the number of addresses you can use without paying exorbitant (for these purposes) fees.

2) Forwarding mail to Google destroys valuable header information. Redirecting mail can cause it to get blocked by the spam filter (sometimes so badly that it doesn't even make it into your spam folder.) So even keeping your own mail server and just bouncing everything up there isn't a viable solution.

3) Having Google pop mail from your server is probably the most workable technical solution, but then Google has your password. Also, there are size limitations, in case you happen to have large attachments that you need to preserve.

The OP may not have any of these issues, in which case Gmail is a great choice. Unfortunately, I'm looking for the same thing (searchability) and Gmail won't work for me.

However, mairix works reasonably well.

not much (0)

Anonymous Coward | more than 3 years ago | (#33489004) []

Maildir (4, Informative)

alexhs (877055) | more than 3 years ago | (#33489006)

Maildir [] .

And if you have an e-mail client that don't support it, use an IMAP server to feed your client. /thread

Backwards and Forwards Compatibility (0)

Anonymous Coward | more than 3 years ago | (#33489010)

Um, the same way people have been doing it since email was invented, text files (with base64 for those binary bits'n'pieces). Only way to be sure.

Good IMAP Server (5, Informative)

caffeinejolt (584827) | more than 3 years ago | (#33489022)

If this is really important to you, and you want it all to work across multiple workstations/OSes, your best bet will be to store it all in IMAP [] . If you have the means and motivation to run this yourself, I would recommend Dovecot [] . If you don't have the means and motivation, then you can use a service like Gmail to run your IMAP although you give up certain freedoms in doing so. For example, I use Dovecot coupled with Maildir++ [] as the physical storage format - as a result I can (if I wanted to) change to any email client I wish very quickly, use different email clients at the same time, etc.

Maildir (4, Interesting)

roderickm (6912) | more than 3 years ago | (#33489028)

Maildir storage format is resistant to bit-rot because it stores each message in a separate file, and uses filesystem directories for mail folders. It's widely supported by user agents (mail readers) and IMAP/POP3/SMTP servers, so you'll never be stranded by the actions of a single software vendor. Finally, it's easily searched using everyday unix tools - find, grep, sed, awk, etc., and you can use the full-text search engine of your choice for speedy searches.

citadel (3, Informative)

samjam (256347) | more than 3 years ago | (#33489034)

citadel at is a full pop3/imap server with full-text indexing.

Thunderbird can use server-side searches to find messages, and I find that works pretty well.

Storage and search can be different problems (1)

greg1104 (461138) | more than 3 years ago | (#33489052)

Have you looked at Archiveopteryx [] ? That is one potential solution to the storage side of the problem. It stores the messages into a PostgreSQL database with minimal tinkering, so you can always get the original plain text stuff back out again. Consider it a database of mbox files that exposes an IMAP interface. You can't get any less proprietary than Postgres, and you can scale up many of its operations using standard database approaches in that area.

What I would do here is store messages there as my permanent store for them, dump periodically to full plain-text backups just for disaster recovery, then experiment with search software that runs on top of it using IMAP as the transport. There I don't have any specific advice. Ultimately it should be possible to extend Archiveopteryx to handle that too--PostgreSQL has decent full-text search built in--but I don't know of anybody working on that.

Probably easier to break this into two pieces, get a robust solution for the storage side, and then see what clients have search capabilities you like that won't choke on importing your data.

IMAP for storage (0)

Anonymous Coward | more than 3 years ago | (#33489068)

Use a suitable IMAP server with an appropriate storage backend to store all that email. No matter the backend storage the daemon you choose uses, your email will always be accessible in an open, standard protocol by any (many) IMAP-enabled mail clients!

POO (Plain Old Outlook) (1)

CatoNine (638960) | more than 3 years ago | (#33489070)

Hate to break it here; but since 1990 I've been storing *all* my mail (and calendar and SMSes) in a plain old Outlook PST archive file. It is a fairly good and fexible database format with lots of import / export en search options. Future compatibility is well guaranteed. To keep it snappy, I've been systematically removing big attachments (documents and pictures), possibly replacing them with a texttual reference to where they are elswhere stored on disk. . I know, I know, low tech and the Borg, but future proof for now :-).

Re:POO (Plain Old Outlook) (0)

Anonymous Coward | more than 3 years ago | (#33489222)

Did you RTFS at all? Outlook fails on every requirement...cross-platform, open format and quickly searchable.

grep (1)

vanye (7120) | more than 3 years ago | (#33489082)

You can laugh, but its good almost enough for what I need.

All my archived email (93-2004) was copied to a NAS as individual messages (still have the Cyrus directory structure). Its the more recent stuff that lives in PSTs that is the problem.

One day I'll get around to going the same for my news postings. That's where the nuggets of interest are.

my solution (1)

je ne sais quoi (987177) | more than 3 years ago | (#33489086)

I'll chime in with my own solution. My archive is not as extensive as yours but I have most everything from 2005 or so (excepting mailling lists, other junk, etc.). My solution is sort of silly, I just use Apple's The reason I use this is because enables you to store and organize everything as separate folders and since Spotlight is blazingly fast and does a great job for searching. I try to keep my number of messages in a folder on the order of a few thousand messages, for my e-mail load I find that breaking up the folders by year works well (yes, you can still search across year). The folders themselves are stored under ~/Library/Mail/Mailboxes. Each folder has its own directory and series of .emlx which are an Apple specific form of xml that includes one message per file. The problem with this solution is that the emlx files are proprietary and subject to change. That said, I have successfully managed to copy mailboxes to new computers with a new OS. It did require an extra step or two beyond just copying my Mailboxes directories to the new computer however. Worst case though, the emlx files are in plain text so you can grep through them if you have to, and you can really had to (e.g. if you're logged onto the computer remotely), or you could write a script that parses most of the information from the file.

Donate your archive to science ... (1)

perpenso (1613749) | more than 3 years ago | (#33489098)

In your will donate your archive to science. I'm sure it would make an interesting thesis project for some PhD candidates out there. I'm seriously, consider this.

Fairly reliable way to get mail out of old clients (1)

bruns (75399) | more than 3 years ago | (#33489120)

Theres one method i've used fairly often in the past for getting mail out of an older client - provided the older client supports imap (lookout and lookout express do).

First, setup a new account on your imap server just for archival purposes (you can setup an imap server on any UNIX/Linux distro and even Windows with Cygwin fairly easily - dovecot is a good place to start). Make sure its using either mbox or maildir (preferred).

Second, setup said account on all the mail clients you'd like to archive. Make sure you are setting them up as imap and not pop3.

Third, drag the contents of each local folder/inbox/etc to a folder on the archive specific imap account. It will take a while, but the entire contents of your mailbox will be copied over, message by message, in imap's way of doing things, then deposited by the imap server into a the local format of your choice.

You've just created flat text versions of client specific archives. Create folders, sub folders, etc and organize things in your modern client which can easily do imap. You can easily search with any numerous free packages, archive and compress permanently with squashfs, or even just leave them available through imap to search with the new Thunderbird's (3.1) global indexer.

Stuff it in a server (1)

SplatMan_DK (1035528) | more than 3 years ago | (#33489124)

You should put all that stuff on an IMAP server on your home network (preferably a box you can reach from the outside using DDNS or a static entry if you have your own domain).

In that way your client OS'es can be whatever platform you choose, and they will all be able to access your mail storage.

Put older mails in separate folders.

If you can work with Linux there are plenty of choices. If not, consider Windows Home Server and get a mailserver product for Windows - there are plenty!

Many advanced email clients, such as Outlook or Evolution, will allow you to search for mails based on any criteria you like (subject, sender, body, date, etc). Hmmm except perhaps the actual mail header ;-)

Personally i would never do this though. Generating and saving data is easy - limiting it is hard. Consider deleting stuff - you could start by deleting everything older than 36 months. The more you have to search through the more difficult it gets. In the end finding a single mail will be (or in your case: IS) like a needle in a haystack ...

Also, why save all mails? Every time you reply to a mail a copy of the original mail is often included in your answer. So from today, consider deleting All inbound mails that you reply to ;-)

- Jesper

Python! (1)

BertieBaggio (944287) | more than 3 years ago | (#33489130)

While this answer will almost certainly not suit the OP, it may be of interest to other folk looking to archive their email. Using python and a combination of imaplib [] and some basic file I/O you can save the original text of messages. My rationale for this was firstly that it's probably less problematic than converting between various email client formats; and secondly that it's a decent way to learn some python! ;)

My rather basic implementation just dumps every email from an (IMAP) folder sequentially. I rely on grep for searching. However, it does have the prerequisite of the email being stored on a mailserver accessible via IMAP.

Look at it a different way (1)

Eristone (146133) | more than 3 years ago | (#33489132)

Scary thought, but you might just want to pick up one of the tools that the lawyers use for electronic discovery. They cover multiple mail formats (including older generations of said formats) and set it up so that it's easy for an intern to search for keywords and the like, so someone that understands tech should be able to use it I've had to use the Clearwell appliance and it did what it was supposed to do, including finding attachments and indexing them for ease of search. (No, I don't work for Clearwell, and wouldn't have used their tool at all except for t.. er anyways)

Roll your own... (0)

Anonymous Coward | more than 3 years ago | (#33489136)

This sounds like the perfect time to roll your own software to do what you are looking for. Use a LAMP stack, write or use a few format converters, voila! you're done!

Plain text and Google desktop (0)

Anonymous Coward | more than 3 years ago | (#33489148)

Future proof your emails by keeping them in plain text format. Then use third party software to index and search your email collection. I recommend google desktop.

Store them in mbx format (2, Insightful)

Anonymous Coward | more than 3 years ago | (#33489156)

I recommend mbox (MBX) format.

1. The format is text based and not likely to become unreadable anytime in the forseeable future.

2. There are no shortage of tools for manipulating mbox.

3. Its easily indexed by full text search applications (MS Search included with windows)

The outlook tools save dialouge has an apple export option which is actually the mbox format.

In terms of archival access I recommend an IMAP server with a folder hirarchy based on month/year. Your mail client should be configured to leave the messages on the server (not attempt to download via IMAP). This somewhat future proofs migration to different mail clients.

The only issue is that imap searches are out of the question so you will need to do searches offline with a full text indexing/search application to first find the general folder location of the message you are seeking.

If your computer has lots of memory then why not just use grep and write a small shell script to forward the message from the archival file to your inbox so that formatting..etc is preserved. If your doing lots of searches the disk cache will back most of it in ram even if its a few GB..

Use a gmail account (1)

dethkultur (617989) | more than 3 years ago | (#33489166)

I did this myself, going back only 10 years though. It has been invaluable. Gmail gives you 7GB (with a little more every day), and the searching is top notch and instant.

There are several apps out there to import mail into a gmail account, and it is pretty easy your email is still available via pop or imap (which I'm doubting)... for stuff in a pst file, what I ended up doing was adding the new gmail account into outlook, and then dragging and dropping emails 1000 at a time into the new account. (i also did this for a Groupwise mailbox from one old job) It's slow, but it works. In addition, it tags the mail for you with "Inbox" or "Sent", so you can easily retag it later. Once it is in there, it is a little gold mine to get whatever you need.

I've had this same problem... (1)

lpfarris (774295) | more than 3 years ago | (#33489172)

I was hoping to read some answer that answered my similar requirements. My requirements were for a searchable, portable mail message database. Ability to tag messages is also important. I had high hopes for Mozilla Raindrop, but my last experience with it didn't do anything for me. Here's what I am doing now: I have set up an IMAP server (imapd) on an Ubuntu server. Thunderbird is currently my primary email client. Thunderbird connects to all my various email accounts. When I am ready to archive an email, it gets copied to a folder on my imap server. The emails are tagged, and stored in folders by quarter to keep any particular file from getting to large. What I would like is the ability to store them in a searchable database with an open source implementation.

mbox +mutt/thunderbird+mairix (1, Insightful)

Anonymous Coward | more than 3 years ago | (#33489178)

I have been archiving my mails for the past 10 years. My method has been to download the mails in mbox format once a year and use a combination of mairix to search through teh mails and either mutt or thunderbird to see the actual mails.

Maildirs + Mairix (0)

Anonymous Coward | more than 3 years ago | (#33489182)

Use Maildir(s) and Mairix for the search engine.

Hotmail (0, Flamebait)

Rik Sweeney (471717) | more than 3 years ago | (#33489192)

Why should Gmail get all the attention?

Just because I can? (2, Interesting)

mrv00t (858087) | more than 3 years ago | (#33489196)

would now like to merge all the emails back into a single searchable archive — just because I can. But there are a few problems: you can't?

We have something similar at Work (3, Insightful)

juanca (49302) | more than 3 years ago | (#33489210)

At work, we needed to archive (for compliance purposes) all the inbound/outbound email messages of our users (about a 1K aprox). We setup an Ubuntu server with postfix and dovecot IMAP over SSL, using Maildir.

Our users generate about 20K email messages daily, and we store each day in it's own directory, something like this:

        |- YYYY
                      |- MM
                                |- DD

The auditors use Evolution to connect to the archive server and search the emails, even though it takes a little while to load a day of emails for the first time, once it's properly loaded searching is really fast. The server is not that powerful, it's a VM with 2 CPUs and 2GB of RAM. You do need a lot of storage though.

Hope this helps.

Whats wrong with Eudora? (1)

sjs132 (631745) | more than 3 years ago | (#33489232)

I still use Eudora... 7.1.09 paid mode from years ago... I use XP for my wifes computer and have different Eudora folders based on who is logged in. Works like a champ. The nice thing is I can sort the old emails by sender (for listserv's and such) to be put into folders, and then use the find email function to search things. I hardly ever have problems finding an email as long as I know WHO/WHAT I'm looking for and where - Body, from, subject, etc.. Sadly, No meta tags.. :( BTW, Mine goes back to.. early 90's also when @ college we used Eudora on Floppies with Windows 3.1 I think... Maybe it was 95 seems so long ago...

imap + sql for storage (1)

itzdandy (183397) | more than 3 years ago | (#33489236)

The many comments here about using just imap with maildir or mbox storage backends forget to mention that these are all very slow to search when you have thousands of messages. They dont store the files in any kind of disk-seek friendly format. soo..

I suggest either putting a dovecot with maildir++ system on fast SSD to overcome the poorly organized(on disk) files
using a mysql/postgresql backend on dovecot or courier or your favorite imap that supports *sql. The mail would be stored with each detail in a different column in the table. Then you can index the sender, recipient, subject etc. You will need to either have a mail client that can use imap search so you can get the search to happen on the db side, or you could put together a php interface to search the database directly for the messages you are looking for.

imap isnt going away in the next decade and either is mysql or postgresql or the sql language in general. worse case would be to migrate the mail table to a new db, which would be done with a db dump and fairly trivially.

Don't use PST (1)

LoganTeamX (738778) | more than 3 years ago | (#33489240)

PSTs are hard-coded to tank, depending on the version of Outlook used. Right now with Outlook 2007 it's 20GB. Nobody NEEDS that much mail, but as an archive it's possible. Maybe a CMS server like Knowledgetree? Provided that it can parse the mail passed into it, it's a great open-source project that seems to have great staying power and development. I'll be testing that myself this week using mail messages that currently reside in Thunderbird.

Mac OS X Mail (0)

Anonymous Coward | more than 3 years ago | (#33489268)

I have worked across most of the clients you mention and found the search interfaces (especially in Outlook) to be horrendous. When Spotlight search came out on Mac OS X, the speed of searching my emails in OSX Mail got so fast, that I now use it as a reference. I have stored email back to 1993, and searches come up in split seconds. There are several subjects that I check my historical email from 11 years of mailing lists before going online or checking a book. I regularly use it to find out "where I put that email from X".

IMAP with maildir backend (2, Insightful)

Fat Cow (13247) | more than 3 years ago | (#33489288)

I migrated all my old personal emails to gmail using IMAP. You can use this to migrate between different on-disk formats like maildir, mbox and pst. I had all my email in yahoo and pulled it down using POP to a maildir, then used an IMAP mail client to copy it across to gmail. Then I regularly back them up from gmail to an on-disk maildir format using mbsync [] . I picked maildir because it's open and seemed better designed than the alternative, mbox. It's not completely standardized though. I've seen PSTs become corrupt so I try and stay away.

Try Aid4Mail (1)

crath (80215) | more than 3 years ago | (#33489306)

There's a commercial, but low cost, package that I've used to do exactly what you are describing:

Aid4Mail converts email to and from a variety of mail formats. The feature that you might find useful is that it will create a zip archive that contains standard .msg format email messages. Use that in combination with an indexing programme. I use X1 (, but there are lots of indexing programmes that will index zip archives for easy searching.

reiser! (0)

Anonymous Coward | more than 3 years ago | (#33489350)

I haven't seen this mentioned yet, but if you DO go with your own IMAP server use ReiserFS for whatever partition the mail resides on. Generally faster for small files (like old emails).

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>