Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Archiving Digital Data an Unsolved Problem

kdawson posted more than 7 years ago | from the digital-ice-age dept.

405

mattnyc99 writes, "It's a huge challenge: how to store digital files so future generations can access them, from engineering plans to family photos. The documents of our time are being recorded as bits and bytes with no guarantee of readability down the line. And as technologies change, we may find our files frozen in forgotten formats. Popular Mechanics asks: Will an entire era of human history be lost?" From the article: "[US national archivist] Thibodeau hopes to develop a system that preserves any type of document — created on any application and any computing platform, and delivered on any digital media — for as long as the United States remains a republic. Complicating matters further, the archive needs to be searchable. When Thibodeau told the head of a government research lab about his mission, the man replied, 'Your problem is so big, it's probably stupid to try and solve it.'"

cancel ×

405 comments

Sorry! There are no comments related to the filter you selected.

Microsoft to help! (5, Funny)

UbuntuDupe (970646) | more than 7 years ago | (#16921476)

I can't wait to hear Microsoft's explanation why the project should use one of their proprietary formats.

Re:Microsoft to help! (0)

Anonymous Coward | more than 7 years ago | (#16921744)

Using a proprietary format will keep your grandkids from reading private stuff you may have written using your favorite word processor. Easier to use than encryption. .. Hmm.. Well slightly.

Open Office is the Answer (1)

fernandoh26 (963204) | more than 7 years ago | (#16921750)

Open Office Docs, FTW!

Re:Microsoft to help! (0)

blindd0t (855876) | more than 7 years ago | (#16922080)

Irregardless of what their explanation is, they should not need more than 640K.

Re:Microsoft to help! (4, Funny)

19thNervousBreakdown (768619) | more than 7 years ago | (#16922248)

That's not a word.

Not too long... (5, Funny)

Electrode (255874) | more than 7 years ago | (#16921482)

"for as long as the United States remains a republic."

So, they're shooting for about 10 years then?

Re:Not too long... (2, Funny)

MECC (8478) | more than 7 years ago | (#16921572)

"for as long as the United States remains a republic."

So, they're shooting for about 10 years then?

10 years or the next presidential election - whichever comes first

Re:Not too long... (5, Interesting)

eln (21727) | more than 7 years ago | (#16921580)

Your timeline may be a little off (at least I hope so), but you're right that it's a silly goal. Whether the US has 10 or 1000 years left, history shows us it will most likely fall at some point, and that point will be fairly soon when compared to the entirety of human history.

Making a format that will survive a thousand years so long as our advanced civilization is still around and still cares is pointless, because as long as there is a continuous line of people that care, they will be willing to transfer at least the more important stuff to new media. The trick is coming up with something that will still be readable when archaeologists dig it up 10, 50, or 100 thousand years from now.

Re:Not too long... (3, Insightful)

Anonymous Coward | more than 7 years ago | (#16921816)

I've been wondering, with our global nature now, will we need archeologists in the future? While I believe cililiziations will surely 'collapse', won't we all be around to immediately take note of it, and update Wikepedia? Seriously, I don't think we're going to be digging for stuff from this time, the global nature of our society leads me to that conclusion. It's not like when Greek society fell.

Re:Not too long... (4, Insightful)

thelost (808451) | more than 7 years ago | (#16921840)

the trick is... hoping that in a hundred thousand years people still care at all about their past. The slow realization as I read Isaac Asimov's Foundation saga about the origins of the Galactic Empire chilled me, mostly because the people of the empire had become so numb to their past as to have made it vanish entirely.

Re:Not too long... (1, Interesting)

Anonymous Coward | more than 7 years ago | (#16922036)

Using a book of fiction for anything isn't usually a good idea.

Granted it's not like most people care nowadays. Look at any slashdot discussion on education, rather sad how people complain about having to take history (heck or any subject they're not "interested in deeply") in school. People want to be ignorant sheep.

Hell look at Xena or the dozens of other "historical" tv shows out there, I shudder to think of how many people's knowledge of history is probably based on such crap alone. In 20 thousand years they'll have Princess Diana was running around with a lightsaber killing communists or something.

Re:Not too long... (5, Funny)

eln (21727) | more than 7 years ago | (#16922152)

In 20 thousand years they'll have Princess Diana was running around with a lightsaber killing communists or something.

Are you trying to say she didn't do that?

Crap, I am so getting an F on my history paper.

Re:Not too long... (2, Insightful)

nine-times (778537) | more than 7 years ago | (#16922236)

As much as anything, it seems like we might worry about people rewriting the past. It'd be hard to edit part of one of the original copies of the US Constitution without anyone being able to tell the difference, because we actually have a really old piece of paper that someone would have to get access to, somehow erase some ink, and write over top with identical ink.

But a historical document in the form of a text file on someone's hard drive? That can be edited without a trace.

How is this different (5, Insightful)

zappepcs (820751) | more than 7 years ago | (#16921550)

than the previous ages where all information was kept on paper or in spoken words? The problem isn't so much how to invent something that will always be readable, but some way to always have the applications to read it. If it were not for the Rosetta Stone, much of what we know about the ancient world might still be a mystery.

Re:How is this different (3, Interesting)

quanticle (843097) | more than 7 years ago | (#16921736)

Its different because of the sheer volume of information being created today. Ancient cultures were not creating millions of pages of information every day.

Your Rosetta Stone analogy is inappropriate. We have not discovered any sort of Rosetta Stone for the ancient Maya hieroglyphs but we have had success in deciphering them because we can apply linguistic analysis techniques to figure out what words correspond to what actions/things. Its a little more complicated for abstract concepts, but you can figure out a surprising amount from basic language knowledge.

Re:How is this different (4, Insightful)

ThosLives (686517) | more than 7 years ago | (#16921964)

It's not so much the Rosetta stone, but the fact that a "Rosetta stone" has a built-in context - it's obviously communication or artwork of some kind. If you have a big pile of digital data, what is it? An image? Compressed text? Audio? Just a sequence of numbers? The thing "printed" information gives you is that the presentation of the data gives you an idea of what it is - we don't yet have any digital data formats for which the presentation of the data gives an idea of the content; in fact, most digital storage mechanisms present all types of information in identical manner.

That's the real challenge - devising a digital storage format in which presentation can be used to apply context to the data.

Re:How is this different (1)

Threni (635302) | more than 7 years ago | (#16921990)

> An image? Compressed text? Audio? Just a sequence of numbers?

Stick the info in a .nfo file. Keep the data on an array of hard drives, using RAID or similar. Keep backups of the data. If a drive fails, replace it.

Next problem?

It's probably... (-1, Offtopic)

csoto (220540) | more than 7 years ago | (#16922114)

pr0n. At least the stuff worth decoding. Just like with cave writing...

Re:How is this different (1)

profplump (309017) | more than 7 years ago | (#16922140)

Just include a copy of `file` on the disk.

Re:How is this different (1)

hclyff (925743) | more than 7 years ago | (#16922220)

Plus if we know (and care) that the information will be wanted in the future, why not try making it as easily retrievable as possible? How much information was lost since the first written works due to badly enduring materials or linguistic problems? We know how much hassle it costs us to get 4000 years old information, still we are not making it any easier for people who will live in 4000 years from now.

Re:How is this different (1)

7macaw (933316) | more than 7 years ago | (#16921814)

So we just have to include a language textbook with every pack of archive DVDs!

Re:How is this different (2, Insightful)

nine-times (778537) | more than 7 years ago | (#16921900)

How is this different than the previous ages where all information was kept on paper or in spoken words?

Paper actually holds up rather well as an archival medium. Plus, you don't need specialized technology to read it.

Re:How is this different (4, Interesting)

s20451 (410424) | more than 7 years ago | (#16921918)

Say western civilization is disrupted for a period of time that is short by historical standards -- 40-50 years would be enough. Electrical power is only sporadically available, and as a result the Internet collapses and PCs become useless. With much more important issues to deal with, such as finding food, people ignore digital data storage.

The era of restoration comes. However, when people blow the dust off those old DVDs and players, they discover that the DVDs have decayed to the point of unreadability. Massive quantities of archived data and knowledge are irretrievably lost.

The main problem in our age is thermodynamics -- information is stored so densely that it tends to decay naturally, on its own. By contrast, ancient stone carvings (as well as their keys, such as the Rosetta stone), are sufficiently durable to last (basically) for ever.

Aw crap! (1)

csoto (220540) | more than 7 years ago | (#16922144)

The era of restoration comes. However, when people blow the dust off those old DVDs and players, they discover that the DVDs have decayed to the point of unreadability. Massive quantities of archived data and knowledge are irretrievably lost.

There goes my copy of Just Like Heaven [imdb.com] ! Oh the humanity!

Re:How is this different (1)

hairpinblue (1026846) | more than 7 years ago | (#16922024)

Use a holographic image embedded in another ceramic of sufficient strength to last the test of time which is placed to use the Hoover Dam wall (or similar large flat surface) as both the display screen and the surface with which to reflect and focus radiation (from somewhere) through the hologram. It is possible that the ceramic used to encase the holographic image will interact with the radiation as it passes through. The most obvious implementation is to polish a section of a large rock wall to use the sun's rays and focus them through a clear ceramic cube containing a clear hologram of standard colors in such a way that the light coming through the hologram is displayed on another large rock wall. A more complicated implementation would capture more obscure radiation, eg. cosmic rays, and, implementing different absorption and emission properties and the refractive indeces of the requisite ceramic materials necessary to have the correct properties, focus them through the ceramic structure of the required shape, through the hologram of the proper composition, and reemit that radiation onto a surface composed of a material which, when excited with the radiation coming through the hologram, would relax by emitting photons in the proper color spectrum of human visible light. This idea is mine. I claim the IP on it.

Easter Island, Stonehenge, Woodhenge, that sort of thing, but a little bit more high-tech similar to Stargate and SG-1.

You're right. There's no way the computer platform and the infrastructure necessary to support it is going to stabilize, any time soon, to be near as secure or robust as a one thousand pound obelisk of impenetrable rock achored onto a slab of granite. At the same time the one thousand pound obelisk isn't going to be able to store and actively display near as much information, or be as readily updateable, as a stack of DVDs.

Decisions decisions...

Re:How is this different (1)

Edzor (744072) | more than 7 years ago | (#16922234)

gentlemen i have your answer. punch cards.

hieroglyphics (4, Funny)

IWantMoreSpamPlease (571972) | more than 7 years ago | (#16921558)

Worked for the Egyptians didn't it?

Re:hieroglyphics (1)

Shoeler (180797) | more than 7 years ago | (#16921754)

Isn't the solution to at least the format readability problem pretty simple? Print out schematics for a reading device on a format that will last the longest. Store said format with all media.

Of course that doesn't fix the problem of archive stability. Tapes are supposed to be relatively long-lived compared, say to a simple CD-R, but haven't we all had one or many more fail on us?

Re:hieroglyphics (1)

MysticOne (142751) | more than 7 years ago | (#16921926)

You're also assuming that somebody will know how to read the schematic in the future. They may not look anything like that in the future, or, the parts may be completely unavailable with which to build such a device. It's not a bad idea, but, I don't think it's any better than many of the others.

Re:hieroglyphics (1)

l0b0 (803611) | more than 7 years ago | (#16922004)

Tonk, tonk, tonk, tonk, tonk, tonk, tonk!

"Aw dammit, I forgot to catch a systemOutOfMemory error!"

*Fetches another rock*

And that, ladies and gentlemen, is why Java will be forgotten by history.

I've heard this problem over and over (5, Interesting)

csoto (220540) | more than 7 years ago | (#16921568)

Working at a University, this is not a subject I'm not unfamiliar with. We've had lots of discussions about this. Everyone always talks about how many zillions of "pieces of information" are out there. The number of web pages in existence is always brandied about. My point in these discussions is that most of what's out there is crap. Humanity is not lessened by its loss. Good stuff gets reproduced, reviewed, studied, dissected, etc. and survives. It *is* stupid to try to solve this problem, because the problem doesn't need solving.

Re:I've heard this problem over and over (1)

jazman_777 (44742) | more than 7 years ago | (#16921660)

It *is* stupid to try to solve this problem, because the problem doesn't need solving.


For example, a large commercial airplane manufacturer has this problem, their engineering docs all aren't under constant review and update, but their vehicles stay in service for decades. After a certain time they archive on microfiche.

Re:I've heard this problem over and over (3, Insightful)

failedlogic (627314) | more than 7 years ago | (#16921682)

Things like music, TV shows, movies, literature, toys, magazines etc are all cultural products. For future generations we need to keep records of there items as much as family trees, great stories, buldings, etc.

Besides, who's to decide what is 'crap' or not. It might be that to the untrained eye, a clay pot from Egypt might not look interesting. The color, shape, its condition, etc might tell someone who used it, why, what cultural value (symbology, usefullness, etc) the pot actually had. And culture evolves from culture. Keeping a record of everything we product allows future generations to inform themselves of who we were and what we did. Quality of the information itself is really unimportant.

Only thing I'd have to add: I wish future generations all the luck in sorting through our garbage piles and recycling/salvaging what they can. If anything, this amount of waste - or crap - is a record of us as much as anything. I can agree with you on this point about crap in our culture!!! ;)

Re:I've heard this problem over and over (2, Insightful)

Trespass (225077) | more than 7 years ago | (#16921848)

Yes, exactly. It's the ephemera that tells you what life was like in any given era, not the palaces, official monuments, etc.

I'll wager you could reconstruct far more about the culture of early 21st century from the contents of a convenience store than that of the White House. There's a big gulf between who a people are and the mask they present to the world.

Re:I've heard this problem over and over (1)

mypalmike (454265) | more than 7 years ago | (#16921912)

I wish future generations all the luck in sorting through our garbage piles and recycling/salvaging what they can. If anything, this amount of waste - or crap - is a record of us as much as anything.

If this is the case, then archaeology will not have changed much. The most useful findings in archaeology are often those found in the waste piles ("middens") of the site.

Speaking of trash... (2, Funny)

csoto (220540) | more than 7 years ago | (#16922066)

I wonder what archaeologists will think of the Zune :)

Re:I've heard this problem over and over (4, Insightful)

kfg (145172) | more than 7 years ago | (#16921722)

Expanding copyright protection to a term equal to two lifetimes means that now even some of the good stuff is being lost because it is not allowed to preserve it.

If preservation is outlawed, only outlaws will be preservationists.

I believe Ray Bradbury had something to say on this subject.

KFG

Re:I've heard this problem over and over (1)

nizo (81281) | more than 7 years ago | (#16921824)

Let me guess; we could go read the Bradbury story, except you can't post it somewhere because it would be a copyright infringement? Oh the irony....

Re:I've heard this problem over and over (1)

kfg (145172) | more than 7 years ago | (#16922182)

For the moment fair use still allows this:

http://en.wikipedia.org/wiki/Fahrenheit_451 [wikipedia.org]

I'd say this line must be held, except the line has already been left far in the dust. The line needs to be rolled back. Do not accept any "concessions" by "industry." They are being made to get you to accept the current position of the line. It's a very old trick.

KFG

Extra irony points. (4, Insightful)

Kadin2048 (468275) | more than 7 years ago | (#16921906)

I believe Ray Bradbury had something to say on this subject.

Perhaps more ironic -- it's a pretty good bet that whatever he wrote on the subject, it's not available online due to copyright restrictions imposed by his publisher or "estate."

Re:Extra irony points. (4, Funny)

kfg (145172) | more than 7 years ago | (#16922090)

Go to the library while you still can and memorize it. Buy camping gear.

KFG

Re:I've heard this problem over and over (2, Insightful)

s20451 (410424) | more than 7 years ago | (#16921994)

Expanding copyright protection to a term equal to two lifetimes means that now even some of the good stuff is being lost because it is not allowed to preserve it.

Huh. So the FSF will win by default. You gotta hand it to somebody who is willing to play the long game.

Re:I've heard this problem over and over (1)

rbegga (662104) | more than 7 years ago | (#16921742)

Amen to the above post. Let's allow a little Darwinism in the selection of what to waste those storage bits on before we create the official book of useless knowledge.

Re:I've heard this problem over and over (0)

Anonymous Coward | more than 7 years ago | (#16921862)

Working at a University, this is not a subject I'm not unfamiliar with

Grammar, however, seems to be terra incognito....

DRM (1, Redundant)

Joe The Dragon (967727) | more than 7 years ago | (#16921946)

This way DRM is bad as it can make data hard to read many years later.

Re:I've heard this problem over and over (0)

Anonymous Coward | more than 7 years ago | (#16922076)

Working at a University, this is not a subject I'm not unfamiliar with. It's a bad introduction to say that you are not familiar with the subject.

Re:I've heard this problem over and over (0)

Anonymous Coward | more than 7 years ago | (#16922166)

That's only half of the truth. What it's worth it will be copied - if possible. Thinking about DRM etc. we may face a situation, even worthwhile digital information fades away or gets dumped for centuries within other junk, just because some people weighing their short term monetary interest higher, than the long term cultural value.

The next possible argument whould be: A society, which allows this, deserves to be (partly) forgotten - or to be remembered exactly for this stupidy.

A huge problem, indeed! (2, Interesting)

duh P3rf3ss3r (967183) | more than 7 years ago | (#16921574)

I've seen this very thing happen where I work -- we've lost data over the years because of incompatiblity issues. On the other hand, as with many things, it's a huge problem but not an insurmountable one. The key is in planning an anti-obsoloscence strategy into every IT decision. Store data files in open formats on robust media and put someone in charge of ensuring the archives are maintained and accessible.

It's not easy, sure, but neither are many of the other tasks we take on as humans.

Republic? (0, Flamebait)

Anonymous Coward | more than 7 years ago | (#16921584)

Thibodeau hopes to develop a system that preserves any type of document... for as long as the United States remains a republic

So he only needs to archive up to November 7th, 2000? That should help him with managing the scope.

Biggest hurdle is legislative (0)

Anonymous Coward | more than 7 years ago | (#16921592)

Basically with the draconian virtual ban on reverse engineering of formats .. this sort of thing can be expected. Especially since copyrights for even abandoned works will be extended indefinitely.

Huh (-1, Flamebait)

Anonymous Coward | more than 7 years ago | (#16921594)

for as long as the United States remains a republic.
So if it turns into a theocracy then everything is lost? Damn!

Keep missing my opportunity (1)

JoeyLemur (10451) | more than 7 years ago | (#16921596)

I've been trying to develop software to do it... unfortunately, my amazing abilities at procrastination and wanting to constant redesign the project have left it languishing for nine years.

Then I keep seeing articles on archiving projects and think I really should get back to work on it...

My solution for digital photos? (4, Informative)

OfNoAccount (906368) | more than 7 years ago | (#16921598)

Since I shoot RAW, I also burn a copy of dcraw.c [cybercom.net] onto every disc - so even if the current platforms get lost by the wayside, there will be code to convert them still.

Storage itself? Currently burning onto Delkin Archival Gold [delkin.com] , storing cool and dark, and in two physically distant locations.

They're also stored on my harddisk, and the best are backed up onto a USB drive.

If it looks like the DVD-ROM drive is becoming obsolete I'll burn them on to whatever comes along next.

If you're truly paranoid you can always print them on archival quality paper using pigment based inks ;)

Re:My solution for digital photos? (2, Insightful)

tomjen (839882) | more than 7 years ago | (#16922026)

Wonderfull plan - but what if you cant find a working C compiler?

Store the files in Notes/Domino! (1)

LibertineR (591918) | more than 7 years ago | (#16921600)

IBM will NEVER shoot that baby in the head, so there will be Notes databases around when my grandkids are long dead.

wrong scope (0, Redundant)

WickedLogic (314155) | more than 7 years ago | (#16921602)

... for as long as the United States remains a republic.

So like, what the next decade at most.... no problem.

Open, well-used, file formats. (4, Insightful)

Daniel_Staal (609844) | more than 7 years ago | (#16921606)

There are only two ways of doing this: keeping a copy of every program used to create these files (and a system to run them on) or converting them to some open and well-supported format.

For text documents, HTML is probably the best bet. It is so widely used and supported readers are almost garunteed to exist as long as computers do in their current form. (And if something ever truely supersedes it, a mass-conversion program will be written anyway.) HTML probably works for basic spreadsheets too. Graphics support for GIF, JPEG, and PNG is probably at that level as well, and MP3 for music.

As a bonus, most of the native programs for the documents to be preserved have translators to these formats already.

Beyond that I have no idea.

Re:Open, well-used, file formats. (2, Interesting)

John.P.Jones (601028) | more than 7 years ago | (#16921726)

Keeping 'a copy of every program' is tractable, 'and a system to run them on' however is not. Data (programs) can be easilly copied to new media and thus live forever (as long as people are around to order new media, install it and copy the data anyways but thats just a staffing problem). But hardware is not so easilly ported, that is unless you have an open, easy to port, emulator that will run your programs. Preferably this emulator should require very little say just a functional C compiler for future hardware. So there you have used a common CS solution, you have REDUCED the problem of saving all your data to the problem of maintaining hardware for which you have a functional C compiler, a much easier task. If you can't find such a machine your solution would then be to implement a C compiler, again a tractable problem.

I have simplified for the sake of being lazy but the essence of portable emulators + extensive software and data backup and storage is sound, you don't even have to concern yourself with speed if you are willing to accept that future hardware will be fast enough.

Re:Open, well-used, file formats. (1)

Doctor Memory (6336) | more than 7 years ago | (#16922074)

Keeping 'a copy of every program' is tractable, 'and a system to run them on' however is not.

Depends on how deep your pockets are. There's a warehouse in eastern PA that has a MicroVAX, a couple of VT240s and an extensive collection of TK50s holding scads of MOL files and pre-clinical trials data "just in case". Not sure if they'll revive everything and re-package it now that there are VAX emulators available, but if you've got data worth (potentially) several hundred million dollars, you'll go to extensive lengths to keep it available. I strongly suspect that these guys have not only the MV, but enough schematics to enable them to recreate some form of VAX, even if they have to cobble up something from FPGAs.

Re:Open, well-used, file formats. (1)

nine-times (778537) | more than 7 years ago | (#16922134)

If future systems are so wildly different from those we have today that they can't have a PNG viewer written for them, how easy will it be to write an emulator now that will run on such wildly different systems, yet faithfully emulate our existing environments?

Provide an example source reader (1)

Kadin2048 (468275) | more than 7 years ago | (#16922258)

I think the best solution currently available, is to include with each copy of your data (or on each backup volume) some source-code implementation of a document reader or parser, in a commonly understood and well-documented language, probably ANSI C (although Ada has all of its documentation in the public domain, so you could include it as well).

This wouldn't help you if you expect people to lose the ability to read the media that you're storing the data and source code on, but that's a much more complicated problem. At that point, you're really talking about stone tablets or metal engravings, rather than backup tapes or CDs.

In terms of practical solutions, ensuring that there are source-available readers, written without external dependencies (besides a compiler), for various document formats, is probably the best way to ensure that they'll be readable. Somewhere else in this thread, someone gives an example: storing a source copy of a GPLed RAW-file processor, on each CD containing RAW images. This seems like a very good idea: assuming that your eventual user can read the media, even if their machine architecture is different and readers don't exist, they have a solvable problem: either find a compiler for their architecture and build the program from the provided source, or use the source code as documentation, to build a compiler in a 'modern' language that can be compiled. The only weakness here is that the language might become a 'lost art,' but that's difficult to avoid. (You could provide documentation on the computer language in a natural/human language, but then you have the same problem of indecipherability of the human language; and ultimately I think a computer language is probably easier to puzzle out than a natural one is.)

Re:Open, well-used, file formats. (0)

Anonymous Coward | more than 7 years ago | (#16921796)

>There are only two ways of doing this: keeping a copy of every program used to create these files (and a system to run them on) or converting them to some open and well-supported format.

This option might become more difficult as more systems (eg. XP, Vista, Office) requires 3rd part to validate. What if your system relies on WinXP and office 2003 to work, but MS by then no longer exists or supports and validate your copy?

Re:Open, well-used, file formats. (1)

nine-times (778537) | more than 7 years ago | (#16922030)

I think that's why he said "open and well-supported format". The "open" part might preclude the use of many Microsoft formats.

Re:Open, well-used, file formats. (1)

nine-times (778537) | more than 7 years ago | (#16922006)

I think your general sentiment is worthwhile, but HTML for word processing documents, JPEGs for pictures, MP3s for audio? Geeze, lets at least be thinking more along the lines of ODF/PDF, PNG24, and FLAC.

However, that doesn't really address the question of medium. It'd be nice to have some sort of nearly-indestructable medium to store all this.

Re:Open, well-used, file formats. (1)

tomjen (839882) | more than 7 years ago | (#16922088)

Hmm - the problem with ODF/PDF is that it cannot be chanced by hand - however LaTeX source code can.

As for music I agree Either FLAC or Wav depending on what you want.

Media? Codac used to make some gold cds that they claimed lasted 12 times as long as the average cd. Other than that you should look at something like a good oldfasioned stone. They last a real long time.

The problem is semi-solved already (1)

Oddster (628633) | more than 7 years ago | (#16921620)

There are several companies out there which specialize in Document Imaging Software [google.com] , specifically for searchable archive purposes. The primary problem is simply the manpower to write the number of conversion filters necessary to import external data formats into the database's internal format; the storage and search/retrieval problems are mostly solved already.

Disclaimer: I used to be an engineering intern at Laserfiche [laserfiche.com]

Thats stupid. (1, Insightful)

CDPatten (907182) | more than 7 years ago | (#16921636)

This isn't the 80's and almost any file being saved in Archives are in formats that many programs can open. Meaning that the specifications for those formats are known... regardless of whether or not it is legal. Even word files are viewable by a number of applications, and nobody is archiving historical information with advanced macros so don't even post with that macro crap.

Also to assume that future generations won't have the sense or ability to figure out how to open files we write is silly.

Because "some" businesses (or the military like the articles suggests) find opening archived information ON THE FLY difficult doesn't mean a (more technolgically advanced) society wanting to learn their past will have the same limitations. This article is just another example of entry level "tech writers" and of how low journalistic standards are.

PS
I am not a journalist... so save your grammer and spelling corrections for someone who is.

Popular Mechanics asks... (3, Insightful)

susano_otter (123650) | more than 7 years ago | (#16921640)

From TSA: "Popular Mechanics asks: Will an entire era of human history be lost?"

Obviously not; Popular Mechanics itself has preserved much of the era in traditional hardcopy formats, making it no less lossy than previous printed-word eras.

Of course, understanding the era from such incomplete and unreliable records will be a challenge to archaeologists and historians; again, not much different from previous eras.

In conclusion: doesn't matter, hardly news.

Government Area of Expertise (5, Funny)

ThatsNotFunny (775189) | more than 7 years ago | (#16921644)

When Thibodeau told the head of a government research lab about his mission, the man replied, 'Your problem is so big, it's probably stupid to try and solve it.'"


I'd trust that guy. If there's one thing our governrment knows, it's stupidity.

HD-DVD - dark data (2, Funny)

openright (968536) | more than 7 years ago | (#16921648)

Interestingly, This Slashdot article is shown to me with advertisement for HD-DVD, which has a data format "forgotten" by design.

Come now... (1)

Cauchy (61097) | more than 7 years ago | (#16921662)

We are unraveling history using models of mitochondrial dna genetic drift using data collected across the planet, and archivist as concerned about future generations not having Office 2003 compatible software? Ok, so the making it broadly available and searchable to current generations may be a challenge, but they can't seriously be concerned about future researchers not being able to read our data formats. I suppose we should be concerned as to whether the physical media will survive, but I doubt we need to worry about our computer illiterate progeny being able to figure these things out.

The solution (3, Interesting)

alexwcovington (855979) | more than 7 years ago | (#16921672)

In this era of virtualization, the solution for x86 software is as easy as retaining a copy of the primary partition of a computer originally used to work with the desired files. Searchability could be a problem for proprietary data formats, but the move to open standards in the future will mitigate that.

The real problem is 60 years of archives of antiquated, proprietary, task-spcific and mainframe computer data cards and tapes whose original programmers are halfway to cedar boxes; if the government can't get their support in time it may as well call all the early stuff a loss and hand it over to archaeologists.

IT people. (1)

justkarl (775856) | more than 7 years ago | (#16921676)

'Your problem is so big, it's probably stupid to try and solve it.'"

Sounds like general end-user hate crime to me. Hey, I've been guilty many times of shunning a user because I didn't feel like fixing his stupid problem.

Re:IT people. (-1, Flamebait)

Anonymous Coward | more than 7 years ago | (#16922092)

Users are stupid. Would RTFM really kill them?

Easy To Do (1)

Conception (212279) | more than 7 years ago | (#16921688)

"how to store digital files so future generations can access them"

Quite simply, you don't store them in one format. Just move everything every 10 years or so. In fact, with Moore's Law and all, you will probably be able to store everything you had before in 1 of whatever is new 10 years later. Hire some part timers to move it or something. It's not a hard problem. It's just an inconvenient one.

It's whether it's WORTH it (4, Insightful)

pclminion (145572) | more than 7 years ago | (#16921704)

It really isn't a question WHETHER we will be able to read old digital data in the future. After all, humans invented these formats, flawed as they may be, and humans can decipher them with enough effort. We can crack cryptography -- a deliberate attempt to make it as difficult as possible to decipher certain information. So it's hard to imagine any data format that could not be deciphered in the future with some honest effort.

Instead it is a question of whether the data is WORTH the effort. From an anthropological standpoint, this is valuable historical data, and its value is not decreased by our inability to interpret it. The benefit of digital data is that it can be copied even if we don't know what it means. It will not erode or decay like other historical artifacts, if we put in the small effort required to preserve it. Assuming humanity doesn't self-destruct, there will be plenty of time in the future for historians to decipher and interpret the data when a need arises for it.

There is software available to do it! (1)

Nautica (681171) | more than 7 years ago | (#16921712)

NuParadigm [nuparadigm.com] is a company that has software that does scanning, indexing and worklflow. I used it at WashU and it is terrific, They call the softare DataFlow.

Open formats (0)

Anonymous Coward | more than 7 years ago | (#16921752)

I think using open formats are good. And standardized formats and well documented formats.
Plain text files (.txt) is very safe. :)

Luxury problem... (1)

Kjella (173770) | more than 7 years ago | (#16921934)

There's an infinite amount of trivia that could be recorded. We could all go around recording "my life in HDTV" recorded at 900GB/hr uncompressed, but it just wouldn't be meaningful. Sure, a certain sample of "everyday life in $foo" is useful, but on the whole who cares. And with digital media, this should be simpler than ever since you with proper redundancy should never experience data loss. Obscure image format? Find a decoder, store is as PNG. Yes, it'll be a lot bigger but you'll never have to worry about lost data from the original or keeping support for a kazillion old formats. You just have to be slightly critical, and don't stretch yourself so thin you could lose something actually important. I refuse to believe that everything we do now is so much more "important" than people 50 years ago or 100 years ago. You ahould be able to do more than ever before, and perfection can never be reached. For example, I could say "show me an untouched part of the $foo forest untouched by human hands". Biologists and whatever would love it. Suddenly you're not talking about 900GB*8billion and blew right off the scale. What's "important"? News media, encyclopedias, wikipedia etc. all screen stuff (well, you can put *almost* any trivia on wikipedia). If a bear shits in the woods, and noone gives a crap, why keep a record of it?

Re:Luxury problem... (1)

HBI (604924) | more than 7 years ago | (#16922040)

I think "The Collected Bowel Movements of HBI" would be a worthwile addition to any library.

simple (1)

insertwackynamehere (891357) | more than 7 years ago | (#16921966)

Create or choose a lossless, unencrypted format that fits with each type of file. Make sure they are always supported with free libraries and utilities. Also, find a type of format that can shrink the size of files (like zip or something)

CDs (1)

anshil (302405) | more than 7 years ago | (#16921984)

Can somebody explain me, how much CDs decade? I thought they were pretty much sealed... except that exotic muchroom, that eats the silicium layer... (an even then, altough *we* can't read it now, with a laser, the information should still be there in the plastic...)

Ummmm... (0)

Anonymous Coward | more than 7 years ago | (#16922110)

Did you happen to eat any of that exotic mushroom? Your thoughts are barely coherent.

Re:CDs (2, Interesting)

lethalwp (583503) | more than 7 years ago | (#16922246)

Afaik, cds are the worst media to 'backup' your precious data.

The first burnable cds you could buy (in the 90ties) were of a decent quality, i still have some burned ones around, and they are still readable (older than 10yrs).
But some newer ones (cheaper, & mass-marketing 'mode') are of an awful quality: i have plenty that "died" when reading them: it begins with some bad CRCs, and then more & more & more, till nothing valuable can be read off it. This happened in LESS THAN 2 YEARS.

The problem with cds:
  - They hate sunlight
  - they hate being in a too hot, or too cold place
  - they hate being in a place with too much/not enough humidity
  - and the worst: they react with air (oxygen).

It's build with a 2mm plastic, the dye is on top, with some 'protective' layer over it. Some are better than others.

Now with DVDs, they seem to be from a much better quality already, the explanation is simple: the dye isn't on the surface anymore, but between 2 slides of plastic glued together. The reaction with air seems to be insignifiant. Atm, i have no single failing DVDR that i know.
But some brands are of better quality than others.

And btw:
"Real men don't use backups, they post their stuff on a public ftp server and let the rest of the world make copies." - Linus Torvalds

Slax to the rescue (1)

dotancohen (1015143) | more than 7 years ago | (#16921992)

Just archive a disk of slax with all the 'forgotten file formats'. It's saved my ass more than once.

To Be Honest... (1)

swatward (956094) | more than 7 years ago | (#16922010)

How much of this stuff really has such high priority. I'm pretty sure I wont want people looking back and finding old myspace blogs and thinking... "Wow everyone 1000 years ago deserved to die."

The good stuff will get saved, the bad stuff, who cares?

But... (0)

Anonymous Coward | more than 7 years ago | (#16922044)

...XML of course. XML solves all of the world's ills!

Stuff I can't read (2, Interesting)

Animats (122034) | more than 7 years ago | (#16922056)

Media I actually have useful data on:
  • MacOS floppies. (Maybe on an older Mac.)
  • MacOS-only CD-ROMs. (Could be read on a Mac, if I still had one.)
  • 4mm DAT-II tapes from NT systems compressed with HP's hardware compression. (I still have a drive for this.)
  • 1600BPI 9 track open reel magnetic tape, UNIX TAR format. (I managed to get that copied before the last 9 track drives at Stanford died.)
  • 8" floppies for the IBM Series/1 minicomputer controller for the IBM RS-1 industrial robot. (Not really very useful at this point, but it would be nice to look at that work again.)
  • IBM PC/AT 5.25" high-density floppies in compressed Fastback backup format for DOS. (Years of DOS work, now obsolete)
  • 8" floppies for the Marinchip 9900 (A small theorem prover, in Pascal)
  • UNIVAC UNISERVO steel tape, 8 tracks, 200bpi, written on an UNIVAC UNISERVO IIA on a UNIVAC 1107. (A compiler I wrote as an undergraduate, plus some very early 3D graphics software.)

Re:Stuff I can't read (0)

Anonymous Coward | more than 7 years ago | (#16922154)


# MacOS floppies. (Maybe on an older Mac.)
# MacOS-only CD-ROMs. (Could be read on a Mac, if I still had one.)


The former, if 1.44 mb, can be read on a PC using the right software. ARDI Executor (68K mac emulator for x86) was able to do it at some point.

MacOS CD-ROM could be read under Linux, I believe.

isn't XML supposed to address this problem? (0)

Anonymous Coward | more than 7 years ago | (#16922060)

i thought XML is the interim cure for this problem.

also, think of the gee...what do i say, "exobytes" worth of new data that will be created in the next 100 years. does anyone really think there will be a strong interest in dredging up and analyzing all of today's mostly circularly repeatative drivel? some of it, perhaps a snapshot from the past could be preserved, but why everything??

how often to you think about accessing your great, great, great, great grandfather's love letters?

UK/BBC Domesday book (2, Interesting)

bLanark (123342) | more than 7 years ago | (#16922106)

It happened recently. When I was a lad, the BBC and UK schools composed a "domesday book", which was supposed to be a parallel to the original Domesday book [wikipedia.org] , which was a bit more than a cencus from the UK made in 1086.The modern one used the popular home PC the BBC Micro (made by Acorn). It was made on laserdisk, and distributed around the UK to the schools that had compiled the information.

Well, 15 years on, it was useless. The then-proprietary format was not readable on anything modern, and there was not much of the old hardware around either. You can google for it ("UK domesday bbc data" should do it), the first link I saw was on the Guardian Online [guardian.co.uk] .

I've still got stuff on floppies, but no-one builds PCs with them anymore. I've got two old laptops with floppy drives, the other three computers have none. (OK, I also have two corpses with floppy drives, and the controllers on two of the new PCs will accept floppy drives, but, please take my point - they're going out of fashion.)

In 20 years time, there will probably be no CD/DVD drives, we'll all be using a new more portable, more backupable, lighter, faster, probably online-only storage medium. Kids won't recognize laserdisks, floppies, or USB ports. They might not recognise keyboards either - who knows?

Coming Soon... (0)

Anonymous Coward | more than 7 years ago | (#16922124)

XML on paper tape.

Doesn't seem that hard, but scale is massive (1)

Burnin' Bush (871724) | more than 7 years ago | (#16922146)

I have a HD specifically allocated for "stuff I plan on keeping forever". I limit it to one of: pdf, tiff, jpg, gif, html, wav, mp3, and plain txt files. The HD is FAT-32 formatted and reads and writes nicely both from my OS-X Mac and Windows-XP PC. On the mac I have a program (graphicConverter) which will, among other things, do batch converts. In a single command, I can convert *.xxx to *.yyy. For example, convert every single tiff on the hard drive to a pdf.

While it might be many days of crunching, it would seem that should some format be on its way out, or some new format prove itself to be the "way of the future", there will be programs to convert *.one to *.theOther. It might take a lot of cpu time, but that is not a big deal. (For example, I just recently converted 300+ GB of .wav files to circa 30 GB of poor quality MP3s (so that I could take my ENTIRE music library on vacation with me and not lug around the big hard drive, and this took about 4 days of background CPU time)!

Nevertheless, I cannot imagine there will not be a simple capability to convert *.one to *.theOther, on a giant scale if necessary.

This is not like the project I did a couple of years ago, where I converted my reel-to-reel tapes to digital format. That required a massive PHYSICAL effort, mounting reels, monitoring the conversion, etc. Once in digital format, converting to new formats, copying to new kinds of storage mediums, whatever I can imagine in the future, will now be as simple as dragging from one icon to another.

So why are we worried? Is this just FUD?

persistence (1)

dgmrdt (1029898) | more than 7 years ago | (#16922150)

I know this doesn't answer the format question, but the media problem can be solved by having multiple copies "in the ether".

I reproduce all my (and several other people's) data on several different machines in different geographic locations, doing it efficiently with "rsync" (and other free tools).

Hard disks come and go, optical and magnetic media fail with time, but the strategy of multiple copies keeps things safe. When was the last time you had 4 machines fail simultaneously in 4 different parts of the country?

CVS? (0)

Anonymous Coward | more than 7 years ago | (#16922160)

Posting anonymously from Redmond. I think CVS is a good answer. Setup a server, do regular backups, and you're done. Sure, the DB grows as documents change (esp. those not in a text format for diff to work), but all your data is there. You will have to buy a new hard drive every now and then for backups, but your data is safe. If you need security as well, use ssh for the CVS connections, and use some partition encryption program. I wouldn't encrypt the whole partition though, as it might be seen as trying to hide something suspicious from the authorities.

Broadcast it! (1)

calzones (890942) | more than 7 years ago | (#16922172)

Why not just broadcast all data out into space. Maybe we can set up a relayer way far away and bounce it back to earth and back again indefinitely.

Reverse engineering (2, Insightful)

wkitchen (581276) | more than 7 years ago | (#16922228)

Open and widely published formats are good, of course. But if you're looking for a really long term solution (as in multiple millennia), then I think the prime requirement other than physical durability should be easy reverse engineering. This way the data has some hope of recovery even if the knowlege of the format has been lost. This generally means that simpler is better. Things like plain ascii text. Uncompressed and unencrypted image and/or audio data. Verbose ascii based vector graphics. Things like that. Put it all on a durable, low density, and simply formatted media that will easily give up its secrets to relatively low-tech and completely non-specialized tools like a microscope. It's not the most efficient way to store data, but it's much more likely to be useable by future archaeologists than things like MS-Word files, WMA files, JPG's, MP3's, etc.

Obligatory quote ;) (2, Funny)

kosmosik (654958) | more than 7 years ago | (#16922244)

Backups are for wimps. Real men upload their data to an FTP site and have everyone else mirror it. -- Linus Torvalds

Open Standards (1)

RAMMS+EIN (578166) | more than 7 years ago | (#16922254)

This is one of the reasons open standards are important [sytes.net] . Not that open formats last forever, but at least they are documented, which means there's some hope of deciphering them after the software that does so is no longer maintained. Of course, that doesn't solve the problem of how to make the actual data survive...hard disks and tapes demagnetize, optical disks become translucent or otherwise unreadable, etc.
Load More Comments
Slashdot Login

Need an Account?

Forgot your password?