Archiving Digital History at the NARA

Please create an account to participate in the Slashdot moderation system

Archiving Digital History at the NARA 202

Posted by timothy on Sunday June 26, 2005 @05:29PM from the sort-and-toss dept.

val1s writes "This article illustrates how difficult archiving is vs. just 'backing up' data. From the 38 million email messages created by the Clinton administration to proprietary data sets created by NASA, the National Archives and Records Administration is expecting to have as much a 347 petabytes to deal with by 2022. Are we destined for a "digital dark age"?"

This discussion has been archived. No new comments can be posted.

Archiving Digital History at the NARA

Load All Comments

Search 202 Comments Log In/Create an Account

Comments Filter:

16000 formats?!? (Score:4, Funny)

by gardyloo ( 512791 ) writes: on Sunday June 26, 2005 @05:30PM (#12916098)

Hm. This sounds like a job for OpenOffice...

Share
twitter facebook
347 petabytes? (Score:5, Insightful)

by ravenspear ( 756059 ) writes: on Sunday June 26, 2005 @05:33PM (#12916109)

Ok, I was tempted to make a pr0n joke about this, but I think the bigger question is what kind of indexing system will this use?

I haven't seen any software system that can reliably scale to that level and still make any kind of sense for someone that wants to find a piece of data in that haystack, err. haybarn.

Share
twitter facebook
- Re:347 petabytes? (Score:3, Informative)
  
  by OrangeSpyderMan ( 589635 ) writes:
  
  I haven't seen any software system that can reliably scale to that level and still make any kind of sense for someone that wants to find a piece of data in that haystack,
  
  Haven't you? Have you ever worked with real archiving before? IBM have some nice solutions that allow us to stock on disk and a WORM library (Tivoli Storage Manager) and index in a (large) Oracle DB - they work and scale just fine (our experience over a couple of hundred teras). You probably wouldn't want all that data in a single archi
  - Re: (Score:2)
    
    by account_deleted ( 4530225 ) writes:
    
    Comment removed based on user account deletion
- Re:347 petabytes? (Score:4, Informative)
  
  by CodeBuster ( 516420 ) writes: on Sunday June 26, 2005 @06:00PM (#12916247)
  
  The most common structure used to index large amounts of data stored on magnetic or other large capacity media is the B-Tree and its variants. The article linked here [bluerwhite.org] explains the basic idea of the balanced multiway tree or B-Tree. The advantage of this type of index is that the index can be stored entirely on the collection of tapes, cartridges, disks or whatever else while only the portion of the tree which currently being operated on need be read into volatile or main memory. The B-Tree allows for efficient access to massive amounts of data while minimizing disk reads and writes. Theoretically, the B-Tree and its variants could be scaled up to address an unlimited amount data in logarithmic time.
  
  Parent Share
  twitter facebook
- Re:347 petabytes? (Score:2)
  
  by commodoresloat ( 172735 ) writes:
  
  How many Libraries of Congress is that?
- Re:347 petabytes? (Score:2)
  
  by awtbfb ( 586638 ) writes:
  
  Ok, I was tempted to make a pr0n joke about this
  
  Note that they don't say which mailbox in the Clinton administration...
- - Re:347 petabytes? (Score:3, Informative)
    
    by ravenspear ( 756059 ) writes:
    
    Well considering that Spotlight took about 2 hours to index my 120 GB drive, that would be (347 * 1024^2) * 2 = 72771174 hours = 83,000 years to index that much data.
    
    Now I'm sure the gov would use a faster system than my laptop, but still!
    - - Re:347 petabytes? (Score:2)
        
        by ravenspear ( 756059 ) writes:
        
        Wow, thanks for catching that. I had it right up to the point where I stopped, but I forgot the last step. I calculated a time of 2 hours for each GB instead of 2 hours for each block of 120 GB. 83,000 / 120 is indeed 660.
        
        The funny thing is I got an A in Calc III last semester. ;)
        
        Re:347 petabytes? (Score:2)
        
        by iamhassi ( 659463 ) writes:
        
        still, that's the original indexing time, if you start small and work up to the size it won't be so bad.
        also you're using modern processors and hard drives, by 2022 347 petabytes won't be anything when we all have terabyte hard drives... think about it, that's 17 years, how big/fast was your hard drive 17 years ago? Let's see... 1988... I didn't even have a hard drive, still all floppy.
        
        By 2022 we'll all have hundreds of terabyte drives and measuring transfer rates in gB/sec, if not larger/faster. Sor
        
        Re:347 petabytes? (Score:2)
        
        by ravenspear ( 756059 ) writes:
        
        Actually I haven't taken any algorithms classes yet, but that's a good thing to remember.
        
        One thing though, wouldn't it still be linear for the entire process? I mean I understand what you are saying as far as the algorithm goes. It's not necessarily going to take twice as long for the algorithm that creates the index to run createIndex(a,b,c,d) compared to createIndex(a,b).
        
        But you still have to scan twice as many files to derive the inputs. How could that part not be linear?
      - Try to help correct other's math sans sarcasm. (Score:5, Insightful)
        
        by jbn-o ( 555068 ) writes: <mail@digitalcitizen.info> on Sunday June 26, 2005 @07:29PM (#12916734) Homepage
        
        You were just a little over 12 times too much. Let's just hop you don't write code for a living :p [...]
        
        To you and the countless others on /. who offer their corrections in a similar tone: Yes, we get it, the parent poster goofed and you supplied a correction. Given the trivial context here, it's hardly a big deal and doesn't warrant sarcasm. Everyone make mistakes and plenty of people make mistakes in their work every day, including people who do work where lives are at stake. That's one reason why it is good to work with other people. In life it's far more important to be forgiving, keep things in perspective, and help other people without the wiseacre commentary and then move on.
        
        Parent Share
        twitter facebook
Compression and moderation? (Score:2)

by moz25 ( 262020 ) writes:

Can't they get more storage performance out of their system by (more) aggressively compressing old information? That shouldn't matter too much to the indexing mechanism. Also, it might make sense to tag the importance of different documents so that its compressing/archiving treatment can depend on that.
Data loss will always be a possibility (Score:5, Insightful)

by divide overflow ( 599608 ) writes: on Sunday June 26, 2005 @05:37PM (#12916130)

It happened with the Great Library of Alexandria, with pagan libraries throughout the Christian era, and more recently has happened with antiquities in Afghanistan and Iraq. The only thing that can reliably preserve data is large scale, geographically widespread distribution of copies.

Share
twitter facebook
- Re:Data loss will always be a possibility (Score:4, Insightful)
  
  by tabdelgawad ( 590061 ) writes: on Sunday June 26, 2005 @06:03PM (#12916261)
  
  Actually, it's more like 'inevitable'. I'll bet almost everyone has unintentionally lost digital data permanently and will do so again in the future.
  
  The key, I think, is prioritization. We all do it individually (important stuff gets backed up many times and often, unimportant stuff perhaps never backed up), and NARA will have to do it too. I don't think backing up a president's email and backing up some minor whitehouse aide's email should have equal importance. The trick will be to come up with a reasonable prioritization scheme that will make the probability of losing the most important stuff very small.
  
  Parent Share
  twitter facebook
  - True but... (Score:2)
    
    by BlightThePower ( 663950 ) writes:
    
    I don't think backing up a president's email and backing up some minor whitehouse aide's email should have equal importance.
    
    I agree really but I also find the problem with data is you never know until its too late. The aide's email could be an international "smoking gun" lost forever vs. an eternally archived Presidential request for diet soda on Air Force One.
    
    I feel that if you can't completely automate backups then the best thing is to give users easy access to backup resources for their own material
    - Re:True but... (Score:2)
      
      by some guy I know ( 229718 ) writes:
      
      The aide's email could be an international "smoking gun" lost forever vs. an eternally archived Presidential request for diet soda on Air Force One.
      
      I gree with this completely.
      The article mentioned the selective retention of information as one possibility for coping with the massive amounts of data that need to be preserved.
      I think that it would be a mistake to do this.
      IMO, all data should be archived in bulk as soon as possible, and then scholars can work on indexing those portions that they deem impor
  - Re:Data loss will always be a possibility (Score:2)
    
    by doshell ( 757915 ) writes:
    
    I think it also has to do with the fact that the media in which we store information are increasingly less durable (compare stone engraved millenia ago, writings in paper of past centuries still readable today, and the relatively short life expectancy of magnetic and optical media).
    
    Now I'm not saying we should all go back to Stone Age, but it does make you think about the irony of progress...
- Re:Data loss will always be a possibility (Score:3, Funny)
  
  by writermike ( 57327 ) writes:
  
  It happened with the Great Library of Alexandria, with pagan libraries throughout the Christian era, and more recently has happened with antiquities in Afghanistan and Iraq. The only thing that can reliably preserve data is large scale, geographically widespread distribution of copies.
  
  True. But I hardly think Alexandria was lost to the tap of the Y key, a pregnant pause, then an "oops."
  - Re:Data loss will always be a possibility (Score:3, Funny)
    
    by Tristor ( 787134 ) writes:
    
    No, but it could have been lost to the strike of flint, a pregnant pause, then an "glukús theométôr" (Sweet Mother of God, for you people that suck). (Note: I spent like 20 minutes transliterating that to Latin just so I could post it on /. because it hated the Greek charset. I have no life.)
    - Re:Data loss will always be a possibility (Score:2)
      
      by divide overflow ( 599608 ) writes:
      
      > Note: I spent like 20 minutes transliterating that to Latin just so I could post it on /. because it hated the Greek charset. I have no life.
      
      Thanks for your selflessness. Or perhaps your Obsessive-Compulsive Disorder.
Answer is Compression? (Score:5, Informative)

by reporter ( 666905 ) writes: on Sunday June 26, 2005 @05:39PM (#12916140) Homepage

National Archives and Records Administration is expecting to have as much a 347 petabytes to deal with by 2022. Are we destined for a "digital dark age"?"
Perhaps, the answer is compression.

Does anyone know whether there is an upper limit to text compression?

In signal processing, there is a limit called the Shannon Capacity theorem, which gives the maximum amount of information that can be transmitted on a channel. In text compression, is there a similar limit?

Note that the Shannon Capacity theorem does not tell you how to reach that limit. The theorem merely tells you what the limit is. For decades, we knew that maximum limit on a normal telephone twisted pair is about 56,000 bits per second, according to the theorem. However, we did not know how to reach it until Trellis coding was discovered, according to an electronic communications colleague at the institute where I work.

If we can calculate a similar limit for text compression, then we can know whether further research to find better text compression algorithms would be potentially fruitful. If we are already at the limit, then we should spend the money on finding denser storage media.

Share
twitter facebook
- Re:Answer is Compression? (Score:2)
  
  by slavemowgli ( 585321 ) writes:
  
  There is no theoretical upper limit on text compression as far as I know (and I'd be rather surprised if there was [1]), but there *is* a (very basic) theorem from Kolmogorov complexity that says that there's always data that can't be compressed for any compression algorithm you devise (for a proof, simply consider the number of strings of length =n for a given n).
  
  1. Well, I'd be surprised as long as you don't make any assumptions about the statistical distribution of bits in the text you want to compress.
  - Re:Answer is Compression? (Score:2)
    
    by mcrbids ( 148650 ) writes:
    
    There is no theoretical upper limit on text compression as far as I know
    
    Which is obviously some hot gas coming from your posterior. Otherwise: 1 (the Holy bible, heavily compressed)
    
    The amount of compression possible in a given string of numbers is inversely proportional to the amount of randomness in the input.
    - Re:Answer is Compression? (Score:2)
      
      by ComputerSlicer23 ( 516509 ) writes:
      
      The problem with your glib answer, is that "1" being the Holy Bible as compressed, is completely legitimate. It's be incredibly useful assuming the entire contents of the bible occurs often in whatever it is you are compressing. It's essentially the concept behind huffman encoding (it's not exactly the same, but picking the most common letters from your symbol set and encoding them as very short binary strings is the basic princepal).
      Depending on how specialized your data is, it might be a net win to do
      - Re:Answer is Compression? (Score:2)
        
        by Nefarious Wheel ( 628136 ) writes:
        
        1 if by land... that's not a compression scheme, that's an indexing scheme.
        Speaking of which, don't we have to consider indexing this megalith? And if things haven't changed *that* much since I was a DBA, you can easily have indexing that takes ten times the storage of the raw data itself. Better factor that in, too.
        
        Re:Answer is Compression? (Score:2)
        
        by ComputerSlicer23 ( 516509 ) writes:
        
        Toe-mato, Tha-mato (that works better when spoken). It is in fact a compression scheme, and an indexing scheme.
        I can easily thing of it as a compression scheme. If they wanted to have it communicate all of that information they could have devised "Morse Code", and actually spelled it out. This is obviously much shorter. The code they specially designed for this single use was exactly as described.
        
        You can think of it as an indexing scheme if you feel like it, but that doesn't mean it's any less legi
    - Re:Answer is Compression? (Score:2)
      
      by Wordsmith ( 183749 ) writes:
      
      That's perfectly legitimate compression, if in your scheme "1" is actually equivilant to the Bible. 11 would then be a nice shorthand, highly compressed way of writing the Bible twice, back to back. 12 might mean the Bible, then the Koran.
      
      Such a scheme wouldn't be very useful for general use, of course ...
- Re:Answer is Compression? (Score:2)
  
  by Biogenesis ( 670772 ) writes:
  
  Sounds a bit like 42, it'll tell us the answer, but we need something else to find the question.
- Re:Answer is Compression? (Score:3, Interesting)
  
  by MasterC ( 70492 ) writes:
  
  The only thing that comes to mind is information entropy [wikipedia.org]. If you're given a text document, you can determine the probability distribution for each letter, letter combinations, for words, or whatever you can think of. Then given the probability distribution, you can determine the information entropy. If, in the sum, you use log with base 2 then H(x) (see formal definitions [wikipedia.org]) gives you the entropy in bits.
  
  For example, if you have a text file with letters of equal probability (all letters have a probability
- Re:Answer is Compression? (Score:2, Informative)
  
  by zysus ( 123604 ) writes:
  
  Actually there is an upper limit...
  It is some of Shannon's work on Information Theory.
  Basically, information has entropy associated with it. Entropy being the randomness of information. Truly 100% random information cannot be compressed.
  The central idea has to do with the probability of something occuring.
  Text compresses quite well because certain letters are more common than others and there are a limited number of symbols. (e for example)
  If i encode e using 1 bit instead of 8 that saves 7 bits.
  
  This is th
- entropy (Score:2, Informative)
  
  by YesIAmAScript ( 886271 ) writes:
  
  You can calculate the amount of entropy in a document (text or no) and that is a limit to how small you could possibly make it.
  
  I don't recall how close modern methods like arithmatic encoding make it to that limit, but I know it's close enough that we couldn't double the compression ratio of text documents from the current state of the art.
  
  Trellis coding is a system for dealing with induced errors in modem signalling. It allows you to cancel some of them out. It doesn't actually increase the throughput in
- Re:Answer is Compression? (Score:2)
  
  by commodoresloat ( 172735 ) writes:
  
  Just make sure you have a portable and open compression format that you will be able to dig up in 50 years. I have a ton of old data that I backed up in MacOS System 7ish using an old version of Stuffit that did automatic compression in the background (I think it was called Space Saver or the like). Well, it was a really dumb idea of me to install that because that data is inaccessible to me without running an old copy of the MacOS (though perhaps classic would work) and digging up that particular version
- Re:Answer is Compression? (Score:2)
  
  by JustKidding ( 591117 ) writes:
  
  Does anyone know whether there is an upper limit to text compression?
  
  That, ofcourse, strongly depends on the entropy of the text to be compressed. When you're talking about the current president's email, well, there can't possible be a whole of entropy in there, so it should be really easy to compress.
- - Re:They use TIFF? (Score:2)
    
    by Fear the Clam ( 230933 ) writes:
    
    Why not just convert it to text? If a picture is worth 1000 words, they can knock the data down to 4 gigs right there.
ha (Score:3, Funny)

by The Big Ugly ( 738455 ) writes: on Sunday June 26, 2005 @05:40PM (#12916145) Homepage

"Archiving Digital History at the NARA"

You'll have to pry it from my cold, dead hands!

Ohhhh, NARA, not NRA....

Share
twitter facebook
Retain it all. (Score:2, Insightful)

by d3m057h3n35 ( 695460 ) writes:

Perhaps it would be best to keep it all, even the stuff that now may seem totally useless, like Clinton administration emails from Janet Reno to Madeleine Albright asking what she thinks about Norman Mineta and his "hot Asian vibe." With search technology improving constantly, it would probably be better than throwing stuff away which could potentially be of interest, or spending time developing the AI to make the task less time-consuming. And besides, we can't make future historians' jobs too easy. They've
Google to the rescue!!! (Score:4, Funny)

by feloneous cat ( 564318 ) writes: on Sunday June 26, 2005 @05:45PM (#12916172)

With the new GoogleNARA...

nara.google.com

Oh, wait... I'm getting ahead of myself...

Share
twitter facebook
Difference between data and trash (Score:5, Insightful)

by HermanAB ( 661181 ) writes: on Sunday June 26, 2005 @05:47PM (#12916187)

In the age of pen and paper, only important stuff was written down. Nowadays all crap is preserved. This is useless. There is a big difference between data and information.

Share
twitter facebook
- Re:Difference between data and trash (Score:2)
  
  by jacksonj04 ( 800021 ) writes:
  
  The trick is to get your data infrastructure organised to start with. Because I have a predetermined system for organising my class notes (Microsoft OneNote, so shoot me) I can reliably pick out notes from a specific class based on date, or topic based on exam questions, or I can take the Google approach and just go "Find me anything to do with this".
  
  The information I need is preserved in an easily accessible form because I made a decision to make all my class notes organised, and as a result I've replaced
- Re:Difference between data and trash (Score:2)
  
  by Prof.Phreak ( 584152 ) writes:
  
  Hmm... Assuming that google/yahoo save all of the queries anyone ever does (over the years), just index the -entire- NARA database using google, and then run it against -all- the queries anyone has bothered to run in the last 5 years. Whatever files do -not- come up in the first 1000 results, can be safely deleted :-)
  
  Just an idea...
- Re:Difference between data and trash (Score:2)
  
  by evilviper ( 135110 ) writes:
  
  Of course there is a difference between data and information, but it seems quite clear that only important information is being preserved.
  
  In the story they talk about multiple revisions of word documents written by leaders, and photos of the effects of agent orange. Do you consider those things "crap"?
  
  The fact is, the government is huge, and there is a hell of a lot of important information to be saved over the years.
  - Re:Difference between data and trash (Score:2)
    
    by HermanAB ( 661181 ) writes:
    
    for example: Multiple revisions...
    - Re:Difference between data and trash (Score:2)
      
      by evilviper ( 135110 ) writes:
      
      The changes made in the process of writing a document are almost as important as the end product. Just look up the drafts of the founding documents, and see all that changed from the start to the final draft. A significant ammount of historical information would have been lost if we did not have those revisions.
      
      Besides that, revisions are very, very small, so it's not as if storage is a real problem. When your 500GB hard drive is full, you don't go through and delete all your unneeded text files first,
Maybe it should have been 45 million e-mails (Score:2)

by drsmack1 ( 698392 ) * writes:

Deleting e-mails seems to be a good way avoid archiving issues.

http://archives.cnn.com/2000/ALLPOLITICS/stories/0 3/23/whitehouse.email/ [cnn.com]
Dark Ages (Score:5, Insightful)

by TimeTraveler1884 ( 832874 ) writes: on Sunday June 26, 2005 @05:50PM (#12916198)

Are we destined for a "digital dark age"?"

If by "dark age" you mean a time in human history where more information is recorded than ever, yes I suppose we are.

I think more accurately, we are headed towards an age of super-saturation of information. I have no doubt we can store all the data we are currently and will be generating. The question is how do we process it in to something meaningful? Just because we have the ability to archive everything, does not mean it will be useful to the [insert personally welcomed overlord] of the future.

Maybe historians of the future will be fascinated that Clinton's instant-message signoff was "l8ter d00d", but I doubt it. We'll want to save everything now of course, because we can. But the majority of the information I suspect will just be filtered out when actually searched.

Personally, I take the "you never know" ideology and save everything.

Share
twitter facebook
- Re:Dark Ages are ahead! All aboard (Score:2, Funny)
  
  by screwthemoderators ( 590476 ) writes:
  
  I think it may be worse than that- that there will be a huge proliferation of false information, sensationalistic 'infotainmnet,' advertising, propaganda, etc... Why, historians of the future may be depending on /. as their main source of of information! Think of what a tragedy that would be!
  - Re:Dark Ages are ahead! All aboard (Score:2)
    
    by identity0 ( 77976 ) writes:
    
    You jest, but it's possible something like Wikipedia or (shudder) everything2 will be on some future historian's list of sources.
    
    So historians in 2100 will have to wade through various trolls and defacement attemps to try to get what people thought about in 2005 - but at least they'll know not to click on Goatse links [wikipedia.org].
- Re:Dark Ages (Score:2)
  
  by d474 ( 695126 ) writes:
  
  "Personally, I take the "you never know" ideology and save everything."
  
  That's a good ideology, because I'm sure we'll develop an AI that would be more than happy to deep search this data someday and shed light on some history we never knew about. It could be very interesting.
Not a dark age... was the past so bright? (Score:5, Insightful)

by G4from128k ( 686170 ) writes: on Sunday June 26, 2005 @05:51PM (#12916206)

Digital technologies mean that archivists now enjoy orders of magnitude more information than they had in the past. Consider all the hallway and phone conversations or jotted notes lost in a paper-based organization versus having an archives of e-mail, IM, and sticky-note digital files.

Digital technologies mean that archivists now enjoy orders of magnitude more potential accessibility that in the past. Even if paper has greater innate archival lifespan, its physical form makes in inaccessible to all but a select monkish class of archivists colocated with their paper archives. Even the select few archivists who are allowed access to paper archives can only effectively process at best dozen documents per minute (and only a dozen per hour if they must wander the files to find randomly dispersed documents).

By contrast, digital technologies radically expand access on two dimensions. First, technology expands the number of people that can access an archive in terms of distance -- a remote researcher can have full access, including access to documents in use by other archivists. A low cost to copy documents means a wealth of information. Second, search tools provide prodigious access to the files -- searching/accessng/reading thousands or millions of documents per second.

To say we face a dark age is to presume that paper documents provided far more enlightenment and comprehensiveness of documentation than paper ever actually did.

Share
twitter facebook
- Re:Not a dark age... was the past so bright? (Score:2)
  
  by Nasarius ( 593729 ) writes:
  
  I think you're missing the point, which is that all that data is now much easier to lose, especially in the short term, if it's not taken care of properly.
  - Cost-of-copy and modes of failure (Score:3, Interesting)
    
    by G4from128k ( 686170 ) writes:
    
    I think you're missing the point, which is that all that data is now much easier to lose, especially in the short term, if it's not taken care of properly.
    
    Perhaps, perhaps not. Sure, digital data can be lost easily, but it can also be copied/backed-up more easily. Assuming $0.01/page for paper copy (a gross underestimate of the cost of paper, toner, and labor for copies) and assuming 10 kB data/page (an overestimate), $10/GB (for high-end maintained storage), then cost ratio is at least 100:1 in favor
Answer is not compression, it's less data. (Score:3, Insightful)

by gus goose ( 306978 ) writes: on Sunday June 26, 2005 @05:54PM (#12916217) Journal

People should think outside the box.

The answer to archiving the required volumes is producing less volumes. Case in point... we recently spent a week or so at work optimising a process that was I/O bound. The bugger took 10 hours to run. Although purchasing faster disks, converting to RAID0, and other techniques did whittle down the execution time to about 5 hours, the final solution was to redefine the process to reduce the actual IO (removed a COBOL sorting stage in the process), and the process is now 2 hours.

Bottom line: with the 100 + 38 million dollars (FTFA) assigned to the project I am sure I could eliminate a number of redundant positions, optimise some communication channels, retire voluminous individuals, replace inefficient protocols/people, and basically reduce the sources of data. Hell, if the US were to actually have peace instead of demand it, there would be a much reduced need for military inteligence, political rhetoric, and other civil responsibilities. The military could be half the size, and what do you know, we could not only reduce the requirement for archiving, but could actually save money in the process.

Remeber, govenment is a self-supporting process.

Go ahead, mark me a troll.

gus

Share
twitter facebook
- Re:Answer is not compression, it's less data. (Score:2, Insightful)
  
  by MasterC ( 70492 ) writes:
  
  ...other techniques did whittle down the execution time to about 5 hours, the final solution ...is now 2 hours.
  
  That's only a 60% reduction. A 60% reduction of 347 PB is still 138.8 PB...still a huge archival task.
  
  Keep 1% of the data still leaves you with 3.47 PB. Not impossible, but still a daunting task.
- Re:Answer is not compression, it's less data. (Score:2)
  
  by rbarreira ( 836272 ) writes:
  
  The answer to archiving the required volumes is producing less volumes. Case in point... we recently spent a week or so at work optimising a process that was I/O bound. The bugger took 10 hours to run. Although purchasing faster disks, converting to RAID0, and other techniques did whittle down the execution time to about 5 hours, the final solution was to redefine the process to reduce the actual IO (removed a COBOL sorting stage in the process), and the process is now 2 hours.
  
  I'm sure I could do that in
burn, knowledge, burn (Score:3, Interesting)

by Leontes ( 653331 ) writes: on Sunday June 26, 2005 @05:54PM (#12916223)

The ancient, esteemedgreat library of alexandria [wikipedia.org] was burned to the ground as knowledge literally turned to smoke, lost to mankind forever. Was it barbarians? Motivated by political revenge? Demanded by religious zealots? Accidental byproduct of an act of war?

Really, it's only the great works of artistry that need to be retained and remained, sustained and maintained. Historically, it's interesting to catalogue art, but politics? The everyday communications that lead up to the horrible decisions that lead our politicians to make the mistake of the daily business? We want records of this?

Perhaps the easiest way of keeping this knowledge at all interesting or inspiring is to burn it regularly, let people imagine what happened to allow such blunders or let apologists spin tales of delight explaining elegant solutions to how stupid people stumbled upon genius decisions. Conspiracy theorists or intellectual artistry can probably generate far greater truths than the truth will ever reveal.

It would save a great deal of money too, just having a delete key. If we are going to care so little for the decisions in the here and now, why preserve the information to be twisted by people in the future with their own biases and projects? We seem to care so little for truth knowadays, why should that change in the future?

Share
twitter facebook
- Re:burn, knowledge, burn (Score:2, Interesting)
  
  by mrogers ( 85392 ) writes:
  
  Doesn't it diminish the aura of a great work of art if you know that it can always be restored from a backup?
- Re:burn, knowledge, burn (Score:5, Insightful)
  
  by mcrbids ( 148650 ) writes: on Sunday June 26, 2005 @06:57PM (#12916591) Journal
  
  Really, it's only the great works of artistry that need to be retained and remained, sustained and maintained. Historically, it's interesting to catalogue art, but politics? The everyday communications that lead up to the horrible decisions that lead our politicians to make the mistake of the daily business? We want records of this?
  
  Absolutely, yes!
  
  History is often taught as "Charlamagne took over Constantinople in the year 12xx" as though military feats really mattered to the average Joe. But, the truth is, America was colonized by people who thought that, however bad it might be in a virgin land, it was BETTER than their lives in Europe.
  
  One of the key failures in public education today is to communicate the understanding that history is comprised mostly of PEOPLE doing ORDINARY things in their time to make life better for themselves and their families. They loved, worked, got bored, and cracked jokes at the expense of their leaders, just like we do today.
  
  History doesn't consist of battles, anymore than history consists of artworks. Capturing more detail in the average, everyday lives of people gives a much better understanding to the cultural norms, and the ideals to which people aspired.
  
  The pyramids of ancient Egypt provide a clear, artistic monument to their culture, yet we have an only modest understanding of their day to day cultures. Similarly, we have Stonehenge as a clear monument to the grooved-ware people of the English isles, but almost NO understanding of who they were and what they felt was important. How much would a true historian give to understand the day-to-day culture of these mysterious "grooved-ware" people of ancient?
  
  Those memos and IMs comprise that understand of people today.
  
  Parent Share
  twitter facebook
  - Re:burn, knowledge, burn (Score:2)
    
    by Leontes ( 653331 ) writes:
    
    I'm not convinced that people want to know the truth of what happened if it doesn't speak to their own specific zeitgeist. Why does it matter some what some joe w. schmuck dies in office does when there is a john q. public living the prototypical existence outside the hallowed halls of policy?
    
    It is only through examining the artistry, the great works, the monuments that withstand the test of time possibly, for those are the things which were the attempts of that culture to enrich themselves. Perhaps in
- Re:burn, knowledge, burn (Score:2)
  
  by commodoresloat ( 172735 ) writes:
  
  Hello, History? You are going to judge these people, aren't you?
- - Re:burn, knowledge, burn (Score:2)
    
    by Leontes ( 653331 ) writes:
    
    those that do not look up quotations before posting are destined to misquote them.
    - - Re:burn, knowledge, burn (Score:2)
        
        by Leontes ( 653331 ) writes:
        
        those who grammar flame are destined to live pleasant, peaceful existences, with much delight. Bastards.
So? (Score:3, Insightful)

by ArchAngel21x ( 678202 ) writes: on Sunday June 26, 2005 @05:58PM (#12916238)

By the time the government comes up with a half ass solution, archive.org will already have it all organized, online, indexed, and backed up.

Share
twitter facebook
contract for archiving system (Score:2)

by 1nv4d3r ( 642775 ) writes:

anybody know what the government has spec'ed TFA's archiving system to do? It says it will need to read 16,000 file formats, and be impervious to terrorist attack (?), but not much else...

I wonder what kind of searches and cross-linking will be done, for instance. What kinds of access control there will be? I'd also just like to see what the 16,000 formats are, out of curiosity. Sounds like a project waaaay larger than the $136 million they've allotted for it so far.

Stupid name.... i'm guessing they w
Have a look at the Fedora Project (Score:4, Funny)

by pangloss ( 25315 ) writes: on Sunday June 26, 2005 @06:06PM (#12916273) Journal

http://www.fedora.info/ [fedora.info]
(Not to be confused with the Linux distribution)

From the website, Fedora is "a general purpose repository service...devoted to...providing open-source repository software that can serve as the foundation for many types of information management systems".

Problem for some is that Fedora can be a little hard to grok. It's not an out-of-the-box repository to install and run, like the repository application mentioned in the article (DSpace). It's an architecture for building repository software. Once you understand the potential for building applications on top of Fedora, you start to see some light at the end of the tunnel for just the sort of issues the article raises.

Share
twitter facebook
- Re:Have a look at the Fedora Project (Score:2)
  
  by ragnar ( 3268 ) writes:
  
  I work on a digital humanities project (and I also work down the hall from the Fedora folks). We are in the process of ingesting our 20,000+ object repository into Fedora. Most of it involves XSL acrobatics, but I'll spare the details.
  
  Fedora is oriented toward digital library work, which I suspect has some carry over with archival work at NARA. They would be wise to look at it, but I'll say from our personal experience, it is a major task to get our materials into Fedora. I don't mean this in any way t
Relevant, interesting post (Score:5, Funny)

by Council ( 514577 ) writes: <rmunroe@gmaPARISil.com minus city> on Sunday June 26, 2005 @06:11PM (#12916290) Homepage

Here is a relevant post by Ralph Spoilsport [slashdot.org] on an earlier article, which can be found here [slashdot.org]. I am reproducing it here in full because it is very interesting and highly relevant.

this is actually a BIG question

And one that I have railed about for many years.
I have been in the same position the Author discussed, and I have come to ONLY negative conclusions. In a few words, and I hate to say this, but buddy:

WE'RE FUCKED.

Digital is a loser's proposition. backing up to analogue or even digital data on analogic substrates (such as DV tape) fail. Simply nad purely.

The *only* thing that comes close is some kind of RAID, and those, even with the plummeting price of storage, are still too expensive given the needs.

Also, a RAID assumes a continuity of several things that are not likely to be continuous:

With Video:
Framerate, number of lines, colour depth, aspect ratio, file format, compression format, Operating system compatibility, etc etc etc. All of these things are variables.

With Audio:
sample rate, compression format, bit depth, file format, etc.

Basically all of it points to very bad places.

I am fairly well convinced that our age will simply disappear. They will find our garbage, the few books not pressed on acidic paper, our paintings (fat lot of good the abstract stuff will mean to them) and drawings, that's about it. the rest will just be shiny little bits of crap in the landfill.

Since we will have used up all the dense energy forms, they will be appalled at the energy requirements just to get the few remaining museum piece devices to work. Archiving the 21st century will be impossible. To the 25th century, the 21st century will be seen as a dark age - not only for the holocaust of the die caused by the failure of the petroleum based economy, but from the simple fact that very little of the information formats we are totally geared into will survive, including this note on /.

His problem of saving personal video is just the tip ofthe iceberg. His problem is the problem of our very civilisation, writ small.

That's why I am abandoning video, and going back to painting. In 500 years, my painting CAN survive. the video simply won't.

RS

And don't give me shit about my karma or whatever. My karma's fine, I don't care about it. I'm copying this because it's interesting and contributes to the discussion.

What do you think about Ralph's thoughts?

Share
twitter facebook
- Re: (Score:3, Insightful)
  
  by account_deleted ( 4530225 ) writes:
  
  Comment removed based on user account deletion
- Re:Relevant, interesting post (Score:2)
  
  by Council ( 514577 ) writes:
  
  Dammit. Why do I keep getting modded "funny" when I don't expect it!?
  
  I think that must be a bad sign.
  
  And the last stupid joke I tried to make (a goddamn PUN) got modded "interesting".
  
  Sigh.
- Re:Relevant, interesting post (Score:2)
  
  by Council ( 514577 ) writes:
  
  Goddammit, why do I keep getting modded "funny" when I'm being serious!?
  
  And the last joke I tried to make got modded "interesting". It was a goddamn PUN, people! There was nothing interesting about it!
  
  Sigh.
  
  Watch this be modded "anti-semitic" or something.
- Re:Relevant, interesting post (Score:2)
  
  by Thing 1 ( 178996 ) writes:
  
  What do you think about Ralph's thoughts?
  
  "My cat's breath smells like cat food." - Ralph
  
  I think he means it.
Slightly overdramatic? (Score:2, Insightful)

by mrogers ( 85392 ) writes:

Are we currently experiencing a dark age because we don't have access to every letter, memo, bank statement and laundry ticket created in the 20th century? Archiving everything is an attractively simple approach, but if it turns out to be impractical we can always fall back on common sense and restrict ourselves to archiving the maybe 10% of things that have even a remote chance of being interesting in 100 years' time.
Tanks for the Memories (Score:3, Interesting)

by Doc Ruby ( 173196 ) writes: on Sunday June 26, 2005 @06:17PM (#12916329) Homepage Journal

We need to imprint holographic storage on synthetic diamonds. Even if they're slow and expensive, they'll last even longer than the paper records they replace. We'll have to spend a fortune redigitizing all the polymer (CD/DVD, floppy, tape), celluloid (microfilm/fiche) and rotating (disc) media that will age to illegibility within our lifetimes. Until we get holographic gems, we need to archive everything on paper, including those expiring media, in a format easily digitized to a more permanent medium. But of course the government, and barely unaccountable bosses, want the public record to disappear down the memory hole. If they could accelerate the process, including newspapers, they'd spend everything we've got (and more) to make it happen.

Share
twitter facebook
- industrial espionage would be sillier (Score:2, Funny)
  
  by mbius ( 890083 ) writes:
  
  "Give them to me."
  
  "What do you want??"
  
  "That Gem...and the Holograms."
- - Re:Tanks for the Memories (Score:2)
    
    by Doc Ruby ( 173196 ) writes:
    
    A quick search finds barcodes [scs-mag.com] storing 26.3KB per square inch [google.com], much denser than the "small novel" claimed in your source (which would be 38 pages at 1MB). No surprise, the average word is 6 letters long, yet their numbers claim it takes 10 bytes to represent them. Of course, compressing a novel, or a government archive, takes a lot less than 6 bytes per word. And barcodes are far from the densest paper archive that I can think of, and I'm not even in the business. Archiving images of typed paper documents [archiveindex.com] ty
Records (Score:3, Informative)

by Big Sean O ( 317186 ) writes: on Sunday June 26, 2005 @06:17PM (#12916330)

NARA makes a distinction between a document and a record. Any old piece of paper or email is a document, but a record is something which shows how the US government did business.

For example, the email to my supervisor asking when I can take a week's vacation isn't a record. The leave request form I get him to sign is a record. An email about lunch plans: not a record. An email to a coworker about a grant application probably is.

Besides obvious records (eg: financial and legal records), there are many documents that may or may not be records. For the most part, it's up to each program to decide which documents are records and archive them appropriately.

Share
twitter facebook
the more I think about it... (Score:3, Insightful)

by 1nv4d3r ( 642775 ) writes: on Sunday June 26, 2005 @06:30PM (#12916397)

I'm not sure most of this stuff is worth making preserving digitally enough to justify the cost. Just print em out, and put them in a Raiders of the Lost Ark-style warehouse. The few people who want to see all of clinton's administration's emails can travel to it and search.

I'd much rather see those hundreds of millions of dollars invested in, for instance, making all out of print recordings and books available on-line. It's a smaller problem (sounds like), but would benefit the world much more than online copies of every government employee's timecard records.

.

Share
twitter facebook
strip MS HTML from Outlook mails (Score:5, Funny)

by rduke15 ( 721841 ) writes: <rduke15@gm[ ].com ['ail' in gap]> on Sunday June 26, 2005 @06:31PM (#12916418)

I don't know about the NASA data sets, but they could certainly save a few petabytes by stripping the stupid HTML part of all Outlook emails...

Share
twitter facebook
Moore's Law saves the day (Score:3, Interesting)

by G4from128k ( 686170 ) writes: on Sunday June 26, 2005 @06:40PM (#12916477)

In 1987, a Mac II came with a 40 MB drive. 17 years later, a PowerMac G5 came with 160 GB drive. This was at least 4000X improvement in storage density and price (and 1987's drive was both physically larger and more expensive than 2004's drive).

Assuming we continue the current rate of advance in storage density and price, future archivist should be able to buy a 0.64 PB drive for under $500 in 2021. A mere quarter of million dollars will provide enough space for a copy of all that stuff.

Share
twitter facebook
- Re:Moore's Law saves the day (Score:2)
  
  by Nasarius ( 593729 ) writes:
  
  First, Moore's Law is about transistor density, which has nothing to do with hard drives. Secondly, hard drives haven't been getting any more reliable. That means all these hard drives have to be replaced every few years. It's a nightmare for long-term storage.
  - Moore's Law and storage (Score:2)
    
    by G4from128k ( 686170 ) writes:
    
    First, Moore's Law is about transistor density, which has nothing to do with hard drives. Secondly, hard drives haven't been getting any more reliable. That means all these hard drives have to be replaced every few years. It's a nightmare for long-term storage.
    
    You are right -- Gordon Moore spoke only of trends in the number of transistors/IC. Yet his law was, if anything, about advances in the technologies of miniaturization. This miniaturization has had profound, indirect effects on storage. The same
I'm guessing... steady state. (Score:4, Interesting)

by dpbsmith ( 263124 ) writes: on Sunday June 26, 2005 @06:41PM (#12916484) Homepage

The Zapruder film was the beginning. In recent years, I've been dumbfounded by the vast extension in recording and documentation of things like crimes in progress, natural disasters, America's Funniest Home Videos, you name it. A plane crashes, and the next day there are ten different home videos from people in the vicinity who had camcorders.

I believe the cost of traditional photography in constant dollars dropped enormously between my parents' time and mine. I know we took about ten times as many silver-on-paper and Kodacolor dye-on-paper snapshots as my parent did. Then we got a camcorder. My parents captured about three hours total of 8 mm silent home movies. I have about forty hours of 8mm and digital-8 camcorder tape.

And since my wife and I got digital cameras, we've been taking five to ten times as many pictures as we did when we used film cameras.

Now, YES, I'm on the format treadmill. Got most of the old 8mm movies transferred to VHS. Got most of the VHS transferred to DVD. Got a lot of the old slides scanned. Got most of my digital images burned to CD. In the last five years, I've probably spent a hundred hours, or 0.2% of my life, on nothing but struggling to copy from old formats to new. I've spent a small fortune getting Shutterfly to print pictures, because to tell the truth I have much more faith in the prints surviving than the CD's.

So, I don't see a digital dark age. I see a bizarre situation in which the quantity of material recorded in digital form continues to increase exponentially for quite some time. _Most_ of it will get lost, and the percentage that survives, say, a hundred years will keep going DOWN exponentially with time.

But I'm guessing the total quantity of 21st century material available to historians of the 23rd century will, in absolute numbers, be just about the same as the total quantity of 20th century material.

It's one of those mind-boggling things like personal death that one can never quite come to grips with. The future is unknown, and we can accept that. But the fact that most of the past is unknown is equally true--and very hard to accept.

Share
twitter facebook
Yeah, but that's 17 years away. (Score:2)

by MacDork ( 560499 ) writes:

In 2022, we'll probably have terabyte capacity in our mobile phones. Seriously. In the early 90s, 80 Gb of drive space ran about $80,000 according to this archived historical document. [wired.com] Nowadays, I can get an 80 Gb drive for about $65 according to froogle, [google.com] and that's without considering inflation. Sure at a conservative $1/Gb were looking at $347 million dollars today, but in 17 years time that'll probably look more like two or three hundred thousand bucks. No biggie for our bloated government.
The Solution: (Score:3, Funny)

by DarkEdgeX ( 212110 ) writes: on Sunday June 26, 2005 @07:20PM (#12916705) Journal

NARA needs to open up tons and tons of GMail accounts. Where do I send my invites so I can contribute?

Share
twitter facebook
Money (Score:2)

by Detritus ( 11846 ) writes:

It doesn't matter whether it is on paper or digital media. If someone isn't willing to spend the money to preserve it, it will be lost. I've seen decades worth of project records and file libraries end up in the land-fill because there was no budget or requirement for preserving them. It's sad to see the products of many years of work by talented people discarded like so much trash.
To add insult to injury, slime-sucking lawyers now advise their clients to destroy records, like email, as soon as possible t
- Re:Money (Score:2)
  
  by detritus` ( 32392 ) writes:
  
  heh, its funny cuz its true... i've "burned" many an email message once a contract was complete and the money was in my pocket... its safer for me to put the money in my pocket and then deny all contact with a company than to actually archive all transactions if someone decides they do not like me any more. IE. i remember one client who threatened legal action if i did not update his website after actually making said website for a certain fee... after asking him to prove it and due to a HD crash it was bet
347 petabytes = ? Libraries of Congress (Score:2)

by pentalive ( 449155 ) writes:

or how many Volkswagon Beetles filled with DAT tapes?

or how many beowulf clusters are needed to search it? sort it? :^)
Very Large Storage Array (Score:2)

by hernick ( 63550 ) writes:

How expensive is data storage, really ? I'll design a ten petabyte (10PB) storage system. You'll see how much it costs. To build this monster machine, I'll be using commercial off-the-shelf hardware organised as a massive Linux cluster.

You may ask "why do you want to build the most powerful Beowulf cluster on Earth when storage companies have all these amazing storage systems ?" Well, this system needs to be an open solution. The system will need to grow and evolve as the needs change. Vendor lock-in is si
I was talking about that in another arena: (Score:2)

by Ralph Spoilsport ( 673134 ) writes:

here:
http://slashdot.org/comments.pl?sid=154005&cid=12 9 17603 [slashdot.org]
My opinion?
The 21st century will disappear from history. In 500 yearstime they will know more about Italy of 1505 than the USA of 2005. Why? the records of Italy will still exist.
The entire digital info system is based on the free ride of petroleum. Petroleum will basically disappear from society fairly soon, (either it will simply deplete, or will become too expensive to drill it out) and everything made of plastic and anything re
- Re:I was talking about that in another arena: oops (Score:2)
  
  by Ralph Spoilsport ( 673134 ) writes:
  
  after the Collapse
  the link shoud have been:
  
  http://www.amazon.com/exec/obidos/tg/sim-explorer/ explore-items/-/0670033375/0/101/1/none/purchase/r ef%3Dpd_sxp_r0/103-5019446-5179842 [amazon.com]
  
  RS
Please contact CERN (Score:2)

by Lawrence_Bird ( 67278 ) writes:

The people at the LHC have been planning for large data rates and storage requirements for quite a few years.
The computational and data-storage requirements for the LHC experiments will be staggering, according to Jamie Shiers, leader of the Database Group in CERN's IT division. "We project 5 to 8 petabytes [PB] of data will be generated each year, the analysis of which will require some 100PB of storage [of which a large fraction will hopefully be online] and more computing power than that supplied by t
The Goal is Data Loss (Score:2)

by grolaw ( 670747 ) writes:

If the current administration has its way, we have no business archiving anything.

One of GWB's first acts was to lock down the Reagan administration's (and, all subsequent administration's) data forever. The 12 year release cycle that the Ford Administration approved was revoked within weeks of Jan 2000 (some cynics say, to prevent data about Iran-Contra and GHWB's involvement becoming public - but that's just crazy talk).

The only data less available than old parchment in a vault is random magnetic domain
Electronic Presidency (Score:2)

by d474 ( 695126 ) writes:

FTFA:

"A new avalanche of records from the Bush administration--the most electronic presidency yet--will descend in three and a half years, when the president leaves office."

The Bush Administration is also the most secretive presidency yet. It would certainly be interesting to be on the IT staff "archiving" that set of data. The IT boss would be amazed out how much free overtime is staffers were willing to do in the middle of the night...
Redundant (Score:2)

by Keith McClary ( 14340 ) writes:

I suspect 99.99% of this information is multiply redundant. With a good compression algorithm, it would fit onto a DVD or a CD or perhaps even a floppy.
Who really needs all of this data? (Score:2)

by tjstork ( 137384 ) writes:

I guess the first question is, why are even keeping this data around. Give the historians something to argue about and delete some stuff.
- Re:Usually when I archive... (Score:2, Interesting)
  
  by ArchAngel21x ( 678202 ) writes:
  
  I guess you didn't see how Mr. Ebbers or the founder of Aldephia are facing prison time. Quit trying to spread that liberal lie that white collar crime pays off. By the way, it is inappropiate to refer to blacks as niggers. Grow up and learn to be a little more tolerant of diversity.
  - Re:Usually when I archive... (Score:2)
    
    by falsified ( 638041 ) writes:
    
    Keep in mind, though, that the prison sentences of Ebbers and the prison sentence of the guy who knocks over a BP are going to be about the same, even though Ebbers stole millions while the gas station thief would have got about $200.
- Every mail is sacred (Score:3, Insightful)
  
  by kfg ( 145172 ) writes:
  
  Every mail is great
  If a mail is wasted
  The gods get quite irrate
  
  Every mail is wanted
  Every mail is good
  Every mail is needed
  In your network neighborhood
  
  Really, the idea of not being able to record and save every post-it note being equated with those times and places where writing itself was denigrated into virtual nonexistence is a bit silly.
  
  KFG
- Agreed (Score:2)
  
  by shpoffo ( 114124 ) writes:
  
  All too relevant.... Recording every minute detail of communication is not the way our brains work now, and doesn't even seem to be on the horizon for how our brains are going. Why in the world would we want to archive every little detail.
  
  Governmental psychosis is costly.
  
  .
  -shpoffo
- Re:Why do we need to archive everything? (Score:4, Insightful)
  
  by felix71 ( 49849 ) writes: <`chris.levesque' `at' `gmail.com'> on Sunday June 26, 2005 @06:24PM (#12916356) Homepage
  
  Actually, one of the main complaints Historians have is incomplete information about the past. Not having every little tidbit makes it impossible to figure out how people actually lived. History _should_ be more than just names, dates, and events. If we can properly preserve and index items that seem really mundane to us, future generations have a _much_ better chance of having some real understanding of how we developed as a society.
  
  Parent Share
  twitter facebook

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

16000 formats?!? (Score:4, Funny)

347 petabytes? (Score:5, Insightful)

Re:347 petabytes? (Score:3, Informative)

Re: (Score:2)

Re:347 petabytes? (Score:4, Informative)

Re:347 petabytes? (Score:2)

Re:347 petabytes? (Score:2)

Re:347 petabytes? (Score:3, Informative)

Re:347 petabytes? (Score:2)

Re:347 petabytes? (Score:2)

Re:347 petabytes? (Score:2)

Try to help correct other's math sans sarcasm. (Score:5, Insightful)

Compression and moderation? (Score:2)

Data loss will always be a possibility (Score:5, Insightful)

Re:Data loss will always be a possibility (Score:4, Insightful)

True but... (Score:2)

Re:True but... (Score:2)

Re:Data loss will always be a possibility (Score:2)

Re:Data loss will always be a possibility (Score:3, Funny)

Re:Data loss will always be a possibility (Score:3, Funny)

Re:Data loss will always be a possibility (Score:2)

Answer is Compression? (Score:5, Informative)

Re:Answer is Compression? (Score:2)

Re:Answer is Compression? (Score:2)

Re:Answer is Compression? (Score:2)

Re:Answer is Compression? (Score:2)

Re:Answer is Compression? (Score:2)

Re:Answer is Compression? (Score:2)

Re:Answer is Compression? (Score:2)

Re:Answer is Compression? (Score:3, Interesting)

Re:Answer is Compression? (Score:2, Informative)

entropy (Score:2, Informative)

Re:Answer is Compression? (Score:2)

Re:Answer is Compression? (Score:2)

Re:They use TIFF? (Score:2)

ha (Score:3, Funny)

Retain it all. (Score:2, Insightful)

Google to the rescue!!! (Score:4, Funny)

Difference between data and trash (Score:5, Insightful)

Re:Difference between data and trash (Score:2)

Re:Difference between data and trash (Score:2)

Re:Difference between data and trash (Score:2)

Re:Difference between data and trash (Score:2)

Re:Difference between data and trash (Score:2)

Maybe it should have been 45 million e-mails (Score:2)

Dark Ages (Score:5, Insightful)

Re:Dark Ages are ahead! All aboard (Score:2, Funny)

Re:Dark Ages are ahead! All aboard (Score:2)

Re:Dark Ages (Score:2)

Not a dark age... was the past so bright? (Score:5, Insightful)

Re:Not a dark age... was the past so bright? (Score:2)

Cost-of-copy and modes of failure (Score:3, Interesting)

Answer is not compression, it's less data. (Score:3, Insightful)

Re:Answer is not compression, it's less data. (Score:2, Insightful)

Re:Answer is not compression, it's less data. (Score:2)

burn, knowledge, burn (Score:3, Interesting)

Re:burn, knowledge, burn (Score:2, Interesting)

Re:burn, knowledge, burn (Score:5, Insightful)

Re:burn, knowledge, burn (Score:2)

Re:burn, knowledge, burn (Score:2)

Re:burn, knowledge, burn (Score:2)

Re:burn, knowledge, burn (Score:2)

So? (Score:3, Insightful)

contract for archiving system (Score:2)

Have a look at the Fedora Project (Score:4, Funny)

Re:Have a look at the Fedora Project (Score:2)

Relevant, interesting post (Score:5, Funny)

Re: (Score:3, Insightful)

Re:Relevant, interesting post (Score:2)

Re:Relevant, interesting post (Score:2)

Re:Relevant, interesting post (Score:2)

Slightly overdramatic? (Score:2, Insightful)

Tanks for the Memories (Score:3, Interesting)

industrial espionage would be sillier (Score:2, Funny)

Re:Tanks for the Memories (Score:2)

Records (Score:3, Informative)

the more I think about it... (Score:3, Insightful)

strip MS HTML from Outlook mails (Score:5, Funny)

Moore's Law saves the day (Score:3, Interesting)

Re:Moore's Law saves the day (Score:2)