Beta

Slashdot: News for Nerds

×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

A Programmatically Accessible Email Archive?

Cliff posted more than 8 years ago | from the email-archives-on-steroids dept.

Communications 61

JohnnyConatus asks: "Does anyone know of a service that offers corporate email archiving and also provides a read-only interface for accessing the archived emails programmatically? Ideally this would be in the form of an database connection or a web service. My current employer is required by the SEC to archive all email communication with clients and we would like to incorporate the archived emails into our internal applications. I have called just about every email archive service I could find via Google, and while most offer a web application to search the emails, none so far have a solution for doing so programmatically. For various reasons, archiving the emails ourselves is considered the last resort. If we had to implement archiving locally, a program that archived by acting as a mail gateway would be the ideal since we'll be supporting a wide-range of mail servers."

cancel ×

61 comments

i remember seeing somethign like this once (1)

Naikrovek (667) | more than 8 years ago | (#13241800)

but it was a mail client where all the mail was stored in an SQL database.

the mail was POP'd in, and it went straight into an SQL database somewhere; it was actually very fast and very reliable.

Perhaps you could do something similar. SQL is pretty ubiquitous.

Re:i remember seeing somethign like this once (1)

Uber Banker (655221) | more than 8 years ago | (#13243281)

Perhaps you could do something similar. SQL is pretty ubiquitous.

Totally. It would be so simple to do. Just have a script on your mailserver to link a script to write the email (breaking down into the various fields) and attachment to a relational database. Super easy. Use a robust database, prioritise write speed, would probably have to be pretty massive size wise as you'd be writing every single email for 7 years (I think that's what the SEC requires, but don't cite me in court!). Infact, it could be optimised as a flat file structure as blindly saving emails into whatever fields would hardly require relational complexity.

Alternatively, you could instruct your users to save the email into whatever document management system you use (associating the email with whatever other documents you have), but this would be extremely time consuming and unpopular for the users, creating the possibility you would miss an important email which the company could end up liable for. On the trade off, so for an automatic relational database. It would be simple to implement, offer a flexible interface, and offer the security and robustness you're after.

Any suggestions as to why not?

Re:i remember seeing somethign like this once (1)

christopherfinke (608750) | more than 8 years ago | (#13243447)

Just have a script on your mailserver to link a script to write the email (breaking down into the various fields) and attachment to a relational database.
I wrote a set of PHP scripts a couple months ago that will do exactly this. If anyone's interested, e-mail me for a zip of the files: cfinke at gmail.com

Re:i remember seeing somethign like this once (3, Informative)

tzanger (1575) | more than 8 years ago | (#13243441)

Exchange4Linux [exchange4linux.org] does exactly this. Works pretty well, we've got a shitload of email (videos too), 5000+ contacts and all manner of data sitting in a PostgreSQL database.

It's NICE being able to execute SQL queries on your aggreate communications data. Perfect example: Our Asterisk head-end system knows which of our customer service people is on pager duty with an SQL query which looks at their service calendar. :-)

Re:i remember seeing somethign like this once (1)

Uber Banker (655221) | more than 8 years ago | (#13245501)

Our Asterisk head-end system knows which of our customer service people is on pager duty with an SQL query which looks at their service calendar.

Ah, it only knows if they should be on duty. You really need to fix a GPS scanner and electrode to their spine to ping them (and get unique response) to really know if they're on duty. And if that works, to send them a few dud messages from the customers they're there to support, e.g. "my scanner says it has 0xF4C83D, I'm running NetBSD-experimental.0.03.01.09.0843 under Alpha-64, what can I do" to make sure they're not faking it.

Re:i remember seeing somethign like this once (0)

Anonymous Coward | more than 8 years ago | (#13248007)

Eat shit poser. How about we put some spam up your momma`s ass? Cocksucker.

Re:i remember seeing somethign like this once (0)

Anonymous Coward | more than 8 years ago | (#13252352)

Ah, it only knows if they should be on duty. You really need to fix a GPS scanner and electrode to their spine to ping them

As a realistic alternative, if they all have bluetooth cell phones, you could query if their cell phone is in range and act accordingly.

Re:i remember seeing somethign like this once (1)

QuantumRiff (120817) | more than 8 years ago | (#13244108)

Take a look at Oracle Colaboration Suite.. Calendar, Files, and Email, all stored in a big ass Oracle Database.

Re:i remember seeing somethign like this once (0)

Anonymous Coward | more than 8 years ago | (#13249840)

emc has a product called centera. it's not cheap, but a company that is under SEC regulations should be ablke to justify it. It has retention periods you can assign to data that pretty much guarentee the data will remain intact and available for the required duration. It also has DR replication functionality built in, and is cluster-able in nature by building large arrays.

geocities, archive.org and public key cryptography (0)

3-State Bit (225583) | more than 8 years ago | (#13241876)

The subject says it all.

Re:geocities, archive.org and public key cryptogra (2, Funny)

Saeed al-Sahaf (665390) | more than 8 years ago | (#13242047)

So, you are suggesting saving possible critical email at Geocities? Why didn't I think of that...

Re:geocities, archive.org and public key cryptogra (-1, Troll)

Anonymous Coward | more than 8 years ago | (#13248286)

Perhaps because you were too busy thinking about ways to suck your own cock?

IMAP (4, Insightful)

g_bowskill (801731) | more than 8 years ago | (#13242034)

If all of the emails are stored in an imap account then you could access this programatically using PHP's Imap functions. I do the same thing using a cron job to check an email account every 5 minutes on my site, if theres a new mail it looks to see if it has an image attachment and if it has automatically posts it online for me.

Information about PHP's Imap functions can be found at http://uk.php.net/imap [php.net] .

I'm not entirely sure if this is the kind've thing you are looking for, but this is probably how I would deal with the problem.

Regards,

Grant

Re:IMAP (3, Funny)

eoyount (689574) | more than 8 years ago | (#13242846)

I'm not sure you should let everyone on /. know that you automatically post images in your email...

You missed the obvious (2, Insightful)

spribyl (175893) | more than 8 years ago | (#13242037)

You said to did a google search but did you talk the the obvious choice. Google.

They seems to have something of a specialty in archiving e-mail and search technology and usually have some kind of API.

good observation (0)

Anonymous Coward | more than 8 years ago | (#13242146)

they sell that service as well, AFAIK.

IMAP as the API (2, Interesting)

GuyWithLag (621929) | more than 8 years ago | (#13242177)

How much mail does your company move per day? Thousand of messages? A gig in attachments per day?

You could very easily implement this as a simple forwarding daemon, or as an plugin to your existing MTA, just store all mail going anywhere in a separate, append-only mailbox, then use IMAP to access it remotely.

IMAP is an industry-proven protocol, there exist many open-source implementations, and has been specifically developed for situations where the mail will remain on the server. It provides you with searching and tagging, plus you can organize the mail store as you see fit (f.e. each years mails in a separate folder, while still able to search all of them at once) (sort known spam in a separate folder while keeping it around). Granted, I'm not aware of any IMAP server that uses an SQL back-end, so this may become a bottleneck for you.

Re:IMAP as the API (3, Informative)

LetterJ (3524) | more than 8 years ago | (#13242557)

I personally use IMAPSize [broobles.com] to archive my IMAP mail that is needed mostly for historical purposes. Just yesterday, I pulled 12,000 messages off of my IMAP server for long-term storage. It turns them either into an mbox file or individual emails. I've then got a script that dumps them into a database as well as just zipping them up for burning to optical media. The database is for quick searching, the files for backup/recovery. I looked for the solution mostly to speed up my IMAP server and client both, which weren't happy with the huge numbers of email I was storing or occasionally crappy connections. I've got a web interface to it that also lets me easily reply to a message directly from there, pull out related messages, etc.

Re:IMAP as the API (1)

pooh666 (624584) | more than 8 years ago | (#13261422)

Yeah right, IMAP will not come close to dealing with the kind of volume that a large company would need. It has to be an SQL db, but one that can also be accessed with IMAP clients. In other words DBmail (dbmail.org)

Re:IMAP as the API (1)

LWATCDR (28044) | more than 8 years ago | (#13242737)

Has anyone made and IMAP server that stores the email in a SQL database? I heard that exchange was going to start doing this. Just had to wonder if something like that has been done in the open-source community.

Why SQL? (1)

Bistronaut (267467) | more than 8 years ago | (#13243272)

I love SQL and relational databases myself, but what would be the point of using such a system in this case?

Re:Why SQL? (1)

commanderfoxtrot (115784) | more than 8 years ago | (#13243765)

Indeed!

mbox and maildir have been around for a long time; they have been designed by experts to store mail which is a very different problem to storing hierarchical column data. You could make the argument that all email conversations are hierarchical, but...

Personally, I have every email for about the last ten years in maildirs; I know that so long as my data is kept safe and secure [thefilehighclub.com] , I will be able to read the emails in ten years' time.

There was even an application featured in a Linux magazine a couple of months ago about various simple scripts which will give you a pretty HTML index of your maildirs/mboxes.

Re:Why SQL? (1)

LWATCDR (28044) | more than 8 years ago | (#13244436)

The ability to do SQL queries on a database store maybe? To use JDBC or ODBC to interface with the stored email?
MySQL for example has full text search capability.

this sounds like it would work... (1)

machinecraig (657304) | more than 8 years ago | (#13242205)

So, you need to archive emails for an organization that has multiple flavors of mail servers, and you need the archive to be accessible by internally developed applications.

It sounds like you need all of your mail servers configured to dump incoming and outgoing messages to a database.

I don't do much mucking around with mail servers, so if they don't have any easy integration with databases, I'm sure you could have them log to a file, with a scheduled script that loads any logged messages into the database. This same script could of course do all of the log rotation \ cleanup required.
There are plenty of hosted solutions out there for this - but if you want to have the ability to query it from your internally developed applications... I would think that you would want the archive to be stored in-house.

Java (2, Interesting)

hexghost (444585) | more than 8 years ago | (#13242244)

The javamail api can do everything you need, and you can plug bouncycastle's api along with it so you have it PGP encrypted.

Re:Java (-1, Offtopic)

Anonymous Coward | more than 8 years ago | (#13242612)

I used to use the "bouncycastle" API, but that was when I worked in the porn industry.

Perl, LWP (1)

A nonymous Coward (7548) | more than 8 years ago | (#13242308)

Write a Perl app to access the web server programmatically. This is pretty simple. The first time I used LWP, it was 15 minutes from the time I started searching CPAN, found, downloaded, and installed LWP, and had an app running which read a web form and POSTed the values we needed. It took another couple of days to polish it up, add in some error checking and consistency checking (did they change that web page on me?), etc, but LWP is very easy to use.

You're talking about SQL storage of messages (2, Informative)

scotpurl (28825) | more than 8 years ago | (#13242520)

http://www.dbmail.org/ [dbmail.org]

is one starting point, but there are a few others.

You're basically replacing /var/spool/mail with an SQL back-end. Things like MBOX or IMAP will suck for dealing with millions of records/messages, but SQL should handle it easily.

Re:You're talking about SQL storage of messages (1)

Oopsz (127422) | more than 8 years ago | (#13243285)

I was about to suggest DBMail myself. Run it using mysql and innoDB and it's fast, ACID compliant and easily accessable using SQL. You can easily distribute the database load over multiple servers, too.

OS X and Spotlight (0)

Anonymous Coward | more than 8 years ago | (#13242552)

forward the email to a mac running 10.4 and mail. hit the spotligt APIs and away you go. we're doing it here and so far so good.
~3,000 emails a day
~500mb

LOL!!! OMFG!!! (0)

Anonymous Coward | more than 8 years ago | (#13244244)

I blew Mountain Dew all over my monitor when i read your post.

Let us know how well your solution is doing when you have 1,000 mailboxes at over a gig each(that's 1TB) and you're looking at 50,000 messages per day or more.

Spotlight? OMFG!!! ROFLMAO!!!

Re:LOL!!! OMFG!!! (0)

Anonymous Coward | more than 8 years ago | (#13249253)

me too!!! - better suggestion - use a Dell p4 running XP and Outlook 2003 - set up a non human mailbox on your exchange and have all incoming / outgoing mails forwarded to it - set delivery to pst file and then use google desktop search to scan through the mails as and when you need to..

or

you could hire another office building and have your mailserver print all attachments and mails - hire a team of people to staple the printstogether and file them in cabinets using date order - piece of piss to implement.

christ...

Assentor (2, Informative)

Anonymous Coward | more than 8 years ago | (#13242910)

We have the same SEC requirements here and we use iLumen's Assentor [ilumin.com] products. The configuration was painful initially, but it's quite effective. Here's an article [s-ox.com] on the Sarbanes-Oxley Compliance Journal. It stores ALL email and IMs and the contents and functionality can be made accessible via APIs or database calls.

You already have the solution in-house. (0)

Anonymous Coward | more than 8 years ago | (#13242944)

You don't come out and say it, probably for fear of Slashdot flaming you into oblivion but, you're using MS Exchange for your mail system, aren't you?

Exchange has the built in ability to archive every message sent to or from the Exchange system. Exchange also allows for searching of this archive via Outlook and a wide assortment of scriptable APIs. If you really wanted to, you could even hack up all sorts of VBScripts to search and massage it any way you like. Yes, this does require more disk space and tape space but, if your company is in a position to worry about SEC mandates, I'll bet they can afford some disks, even the expensive high quality ones for SANs and such.

If you're not using Exchange, then let us know what it is and we can help with a solution. There aren't any services to do this because of (a) Security concerns and (b) it is easily done inhouse. Hell you could even put a Postfix frontend on whatever system you have, pipe the archived messages into a PostgreSQL database and run SQL queries on the archive store until you are blue in the face. No real coding required.

But, I'm betting you are using Exchange so, get some disks and learn some VBScript. TTFN

Re:You already have the solution in-house. (1)

itwerx (165526) | more than 8 years ago | (#13246684)

I'm betting you are using Exchange so, get some disks and learn some VBScript. TTFN
1 - Exchange 2003 has a DB interface
2 - There are 3rd party products that integrate with Exchange and provide everything you need for compliance in a nice pretty interface.

SEC will not allow exactly what you want (2, Informative)

Anonymous Coward | more than 8 years ago | (#13243077)

Disclaimer: I work for a company that makes SOX compliance appliances.

The SEC requires you to keep all email in house. As far as we can tell that means your storage must be in house, not at a service provider.

We don't provide such an interface. In our products. We want as few possibilities for bugs where you can delete/alter email as we can. By sticking to our interface we have a better chance of keeping you from doing something illegal (which could reflect on us). However we do provide a web interface which a cleaver programmer can script.

If you use something other than Microsoft Exchange, you can set the always-cc option to send email to several users, one of which is the account our device polls from, and one is an account you can doing anything on. Frankly I prefer this option. We don't want you messing in our product for anything other than the compliance purposes we have designed as it may open us in court questions of if we are for compliance when we do those other things.

Yes we are paranoid, but there are some strong laws around on this subject, and right now regulators are looking for examples to prove they are doing their job.

Re:SEC will not allow exactly what you want (0)

Anonymous Coward | more than 8 years ago | (#13243716)

However we do provide a web interface which a cleaver programmer can script.

Umm, is that cleaver as in "big square-bladed knife", or cleaver as in "June and Ward"? :o)

If it's the former, I have wonder how many ways one can program them.

If it's the latter, does that mean that Eddie Haskell or Lumpy Rutherford can't use it?

Re:SEC will not allow exactly what you want (1)

jrockway (229604) | more than 8 years ago | (#13259653)

Your company should take a hint from M$ and get a better EULA. Clauses like "our company isn't liable even if we intentionally fuck up your data" would probably be a good idea :)

Consider EMC (0)

Anonymous Coward | more than 8 years ago | (#13243137)

Have you looked at EMC? Their products such as EMC Legato Email Xtender and EMC Centera may do what you want. I'm not sure if the former provides an API to access email. The latter would, but might be to "low-level" for you. Worth looking into if you haven't, though.

Object DataBase (1)

ratboy666 (104074) | more than 8 years ago | (#13243169)

EMC Centera for email storage. Its a CAM. You can find a couple of test Centeras on the 'net (but they will have data trashed periodically).

Will retain records pursuant to a number of different gov requirements for reporting.

Use Kasten Chase for encrypting, if needed (we have an object shim -- and this is a plug). That will give you your data security.

Maybe other solutions... but that's the one I am familiar with.

Ratboy.

Re:Object DataBase (1)

cg (18840) | more than 8 years ago | (#13245484)

Strap on Legato Email Extender and Documentum, or any of the other Centera partner products, and you should have a solid solution.

MessageRite (1)

HTMLSpinnr (531389) | more than 8 years ago | (#13243179)

I came across this company in the past as a potential job opportunity. Sounds like they do exactly what you want, however I do believe their application is Windows specific (or it was at the time). They also offer IM archival using their client.

www.messagerite.com

YMMV, but... (1)

PaulBu (473180) | more than 8 years ago | (#13243275)

... I've had good success at archiving all my previous department's e-mails (actually, what we were calling "e-mail journals") using Zope and a tiny product on top of that called MailBoxer. Provides nice web-brouseable/searchable interface to the archives, and Zope can be extended via python to do what you want to do.

Paul B.

Java or PHP + SQL Database (1)

rlp (11898) | more than 8 years ago | (#13243286)

Shouldn't be that hard to slap a Java or PHP front-end on top of a SQL database to do what you want. I'd look at sourceforge[1,2] to see if someone's already built it.

[1] Sourceforge is owned by the same folks that own slashdot.
[2] I'm not affiliated with either, except as a user.

you mean... (1)

shaggy43 (21472) | more than 8 years ago | (#13243451)

this [veritas.com] ? Veritas provides a CLI to all their other products, I'd be terribly surprised if this was an exception.

I built one. (1)

ubiquitin (28396) | more than 8 years ago | (#13243499)

About two years ago I completed an in-house project that involved integrating sendmail (via pipes), PHP (for text handling), and MySQL, to archive all messages in a set of MySQL tables that can then be queried later. Separately, I have a web-based search and browsing system. If you're interested in using these tools to build out your email repository, you'll find my contact info at phpconsulting.com. It doesn't handle attachments very well, but that could be built out without too much hassle.

For what it's worth, if you're considering building something yourself, there are various advantages to not using mbox or maildir, but a truly relational structure for archiving the email.

I doubt you'll find what you're looking for (1)

captainclever (568610) | more than 8 years ago | (#13243508)

I suggest a homegrown solution. If you primarily need to archive (but not search) emails then just use PostgreSQL (or MySQL i suppose).
If you'll need to search them, forget a database and use a Lucene [apache.org] index. You could also store all the text verbatim in lucene and forget the database.

Do some research ... (3, Informative)

gstoddart (321705) | more than 8 years ago | (#13243927)

If your company is doing this for SEC compliancy (meaning Sorbanes-Oxeley) you really need to look into all that goes along with this.

You'll still need to provide security as to who can view messages. Search for legal purposed. You have document rentention scheduled you'll need to adhere to. You'll potentially have a freakin' huge volume of data to look it.

I'm seeing a lot of references to PHP and Java classes -- something as important as SEC regulations for e-mail archiving shouldn't just be thrown together willy-nilly. Failure to get it right could cause *huge* legal problems downstream.

Mail archiving for SEC/SOX is an utterly non-trivial undertaking.

Cheers

Here are a few solutions we've looked at... (1)

NetPoser (266960) | more than 8 years ago | (#13244079)

Try:
Legato from EMC (http://www.legato.com/ [legato.com] )
CommVault (http://www.commvault.com/ [commvault.com] )
KVS (which is now part of Veritas, http://www.veritas.com/kvs/ [veritas.com] )

We have looked at all of these for the past few months and all of these are Sarbanes-Oxley compliant.

If not now, soon. (1)

Telastyn (206146) | more than 8 years ago | (#13244120)

You might want to contact Symantec about this. The primary reason they bought Veritas [and to a lesser degree Brightmail] was to make this sort of SEC mandated email archival setup.

Google (1)

aero6dof (415422) | more than 8 years ago | (#13244339)

Well, what about Google?

They have a indexing appliance as well as a Google API? That way your company can also keep all its indexed email in its own data center.

http://www.google.com/enterprise/gsa/features.html [google.com]

Microsoft Access reads from Microsoft Outlook (1)

Michael Spencer Jr. (39538) | more than 8 years ago | (#13244354)

I'm not sure about your office, but my employer uses Microsoft Outlook and Microsoft Exchange servers. Microsoft Access can 'link' data from Microsoft Outlook data sources, and access the messages as records in a table. You can then run a query in MS Access to insert that data into another table on a proper database server, and then use any technology you like to access the data there.

If your employer doesn't use Microsoft Exchange servers, this advice won't really apply to you.

If you're a moderator and you want to ensure that people who use Microsoft products and submit Ask Slashdot questions never get useful answers, feel free to mod me down.

Take a look at ZANTAZ (0)

Anonymous Coward | more than 8 years ago | (#13244410)

http://www.zantaz.com/ [zantaz.com]

They offer a variety of email archive hosting and retrieval solutions, both on and off site.

Courier (2, Informative)

bobv-pillars-net (97943) | more than 8 years ago | (#13244481)

Courier [courier-mta.org] has an optional "big-brother" mode that makes a copy of every email that passes through. It can be set up as an email gateway and has a flexible authentication and filtering mechanism with standard plugins for SQL, LDAP, PAM, and others.

I Almost Hate To Ask This (1)

Goo.cc (687626) | more than 8 years ago | (#13245298)

but what does the poster mean by 'programmatically accessible' email archive?

Re:I Almost Hate To Ask This (1)

Mr. Slippery (47854) | more than 8 years ago | (#13250500)

but what does the poster mean by 'programmatically accessible' email archive?

"Programmatically accessible" means accessible by a computer program.

I.e., software that not only hs a user interface (GUI or CLI) but has a function library, or TCP protocol suite, or web services, or RPC, or REST, or some other way to access the data and functionality of the software, from other software.

Re:I Almost Hate To Ask This (1)

Goo.cc (687626) | more than 8 years ago | (#13263677)

Thanks.

TFS Gateway (1)

richie2000 (159732) | more than 8 years ago | (#13247952)

Check with Fox Technologies [foxtechnologies.com] (formerly known as TFS Tech and TenFour). The TFS Gateway did multiple mail system integration and automatic e-mail retention way back in 1997 when I used to work there. They seem to be pushing SOX compliance pretty hard these days so give them a call.

Both Oracle and HP have products that can do this (0)

Anonymous Coward | more than 8 years ago | (#13266071)

EOM
Check for New Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Create a Slashdot Account

Loading...