Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Avoiding a Digital Dark Age

kdawson posted more than 4 years ago | from the you-must-remember-this dept.

Data Storage 287

al0ha writes to recommend a worthwhile piece up at American Scientist on the problems of archiving and data preservation in an age where all data are stored digitally. "It seems unavoidable that most of the data in our future will be digital, so it behooves us to understand how to manage and preserve digital data so we can avoid what some have called the 'digital dark age.' This is the idea — or fear! — that if we cannot learn to explicitly save our digital data, we will lose that data and, with it, the record that future generations might use to remember and understand us. ... Unlike the many venerable institutions that have for centuries refined their techniques for preserving analog data on clay, stone, ceramic or paper, we have no corresponding reservoir of historical wisdom to teach us how to save our digital data. That does not mean there is nothing to learn from the past, only that we must work a little harder to find it."

cancel ×

287 comments

Sorry! There are no comments related to the filter you selected.

Won't matter (4, Insightful)

countertrolling (1585477) | more than 4 years ago | (#31252808)

Our landfills will provide all the info they need.

Re:Won't matter (1)

Tubal-Cain (1289912) | more than 4 years ago | (#31252874)

Newspaper is not exactly the most durable parchment we've ever used.

Re:Won't matter (1)

biryokumaru (822262) | more than 4 years ago | (#31253006)

Tubal-Cain (1289912)

You know, I said that to a judge once and he threw out my case and went to go paddle himself...

Re:Won't matter (1)

HBoar (1642149) | more than 4 years ago | (#31253070)

And just as well too. Most journalism these days is best forgotten.

Re:Won't matter (2, Informative)

Smallpond (221300) | more than 4 years ago | (#31253616)

newspaper buried in a landfill will easily outlast unmaintained digital data. I'll send you some 8" floppies if you don't believe me.

Re:Won't matter (1)

Anonymous Coward | more than 4 years ago | (#31252902)

But what my digital pictures of my family? It's kinda cool to dust off the old photo albums and check out my family history. Plus I can't imagine how P.Oed my Mom would be if I lost all the family pics (digitized from slides before the slides fade away to nothing).

Every time I upgrade hw or sw, no matter how careful I am, I seem to lose a file or two. I've taken to planting a couple of cheap NAS at my friends place and we all share the cost and benefits of off-site storage.

Re:Won't matter (2, Insightful)

rubycodez (864176) | more than 4 years ago | (#31252924)

right on, and for that matter it is silly to say we don't have paper records any more. We have even more of them than ever before. Receipts, leases, mortgages, contracts, invoices, manifests, packing slips, explanation of benefits (EOB), licenses, warranties, guarantees, manuals....fuck, if anyone thinks digital age means less just order a single piece of software on Amazon and by the you take everything out of the box you'll have generated at least eight items on the list I just mentioned. God damn!

Re:Won't matter (4, Interesting)

Third Position (1725934) | more than 4 years ago | (#31253230)

Our landfills will provide all the info they need.

Well, I'm not entirely sure of that. If you pick up a stone or a paper with characters on it, you at least have an idea what it's purpose was. But 5000 years from now, how does someone interpret a shiny little disk? It might be a long, long time before someone is able to discern it's purpose, let along figure out how it's encoded and how to un-encode it. And that's even before getting a look at the language, and learning how to translate that.

That's one advantage of paper, stone and parchment - they don't assume a technical infrastructure in order to use them.

I have heard that some of the braided ropes left by Mayans might actually be a "written" language. But consider that it's taken us over 500 years to suspect these braids are a form of media, let alone learned to read it, and you can imagine what a future civilization might be confronting trying to figure out our digital media.

Re:Won't matter (4, Funny)

JustOK (667959) | more than 4 years ago | (#31253352)

They wouldn't be able to use that stuff because of copyrights and DRM

Re:Won't matter (1)

y4ku (1681156) | more than 4 years ago | (#31253376)

That reminds me just a little bit of Wall-E. Here's to hoping it never comes to that. I've spent plenty of time working in archives for a museum in Chicago and this is an interesting point that I've never really thought of. Here I am spending hours a day archiving manuscripts and scanning letters and signatures into digital to have for the future and safe-keeping but who will upkeep it in the future. Looking into it, this doesn't seem like that big of an issue. Digital media is so easy to transfer and copy at such high speeds... Its almost a wonder how we manage to know so much about those before us. Especially the way they used to transfer information (in candle-lit monasteries on flimsy paper). The way things are now, the future will know way more about us than we will ever know about those before us.

Re:Won't matter (0)

Anonymous Coward | more than 4 years ago | (#31254006)

"(in candle-lit monasteries on flimsy paper)"

If you really think so,

"I've spent plenty of time working in archives for a museum in Chicago"

You didn't take too much benefit from the time spent on it.

The question remains (2, Interesting)

Anonymous Coward | more than 4 years ago | (#31252842)

How much of it is really worth saving? Except The Goatse image and a good RickRoll video I mean...

Re:The question remains (1)

Tubal-Cain (1289912) | more than 4 years ago | (#31252908)

"Those that fail to learn from history are doomed to repeat it."

The more context you have, the easier it is to learn.

Re:The question remains (1)

giampy (592646) | more than 4 years ago | (#31253110)

actually ... not at all.
After a certain level, too much information is even worst than too little.

Re:The question remains (0)

Anonymous Coward | more than 4 years ago | (#31253356)

We'll soon find out. If it ain't worth paying a tiny yearly fee to preserve a file on an off-site storage, it ain't worth saving.

The only problem is what happens when the off-site company goes bankrupt. New legislation needed?

Re:The question remains (1)

YrWrstNtmr (564987) | more than 4 years ago | (#31253518)

How much of it is really worth saving?

Pictures of my grandmother from the '20s? Priceless. Mostly useless to anyone but our family, but there ya go.

The fight is lost (1)

MeNeXT (200840) | more than 4 years ago | (#31252854)

How many of you have digital files from 15 years ago that you can read today? 20 years? There was no DMCA back then, now just imagine the future....

Re:The fight is lost (1)

Bluesman (104513) | more than 4 years ago | (#31252922)

I do. The cool thing is that they all fit in a tiny portion of my portable hard drive.

It's amazing that the entirety of everything I've produced since high school can fit on a single $100 device, with plenty of room to spare.

Re:The fight is lost (1)

NotQuiteReal (608241) | more than 4 years ago | (#31252968)

I have files from at least as far back as 1982 (Z80 source code). Of course they are not on their original media (which, might of been 8" floppies, I really don't recall.)

Re:The fight is lost (0)

Anonymous Coward | more than 4 years ago | (#31253194)

I still have my programs from the C64 era and I can still run them, thanks to emulation. Most games from that time still exist only due to crackers who removed the copy protection which would otherwise be a challenge for emulators.

Re:The fight is lost (1)

pookemon (909195) | more than 4 years ago | (#31253204)

I still have all the floppies and HDD's from my Amiga (and the ones I've bothered to look at in the last couple of years do still work). The question is more how many of us have files from 15 or 20 years ago that we want/need to read? I undoubtably have assembler code on the Amiga, and my C code from my Mac development at UNI. I might one day look at it and revisit the "good ol days" - which is exactly why I still have my Amiga tucked away in a box. But I doubt it.

Of course I am probably in the excessively rare situation where code that I wrote 10+ years ago is still code that I work with. Granted it was originally written in VB 4/5 and it's now written in .NET 2k8 - and that specific code is rarely modified - but i could probably go back and look at the original code if I wanted (I've certainly got enough copies of it lying around between backups to CD/DVD/Subversion/VSS HDD's etc.

But for now I will keep my copy of my UNI code on it's floppy secured to the filing cabinet with the Neodymium magnet I bought specifically for the purpose. That way I'll know exactly where it is when I find a 15 year old Mac that'll read it...

Re:The fight is lost (1)

peragrin (659227) | more than 4 years ago | (#31253244)

I do. while only 12 years I have tax forms from 1998 in PDF format that open just fine. I efiled those taxes too.

Of course I worried about that 8 years ago, and switched all my files out of excel wnd word formats to text, rtf, and at the time OpenOffice. Now they are in ODF. Since ODF and PDF formats are easy to ready by many programmers, and thus open. Emails are stored in mbox. All of it is stored in multiple locations, with encryption(and decryption software) used as needed.

I moved all my data because i was tired of being tied to any one platform. I can access everything i have electronically on all three platforms now.

Re:The fight is lost (1)

iggymanz (596061) | more than 4 years ago | (#31253290)

And what parts of those digital records would be *important* information? c'mon, you are talking about personal crap. Important records (birth certificate, medical records, academic records, insurance, account statements) will be on paper

Re:The fight is lost (1)

hldn (1085833) | more than 4 years ago | (#31253574)

"personal crap" could be just as interesting and as important to future researchers. think of all the things we could know now if we had more "personal crap" from the peoples of times past.

Re:The fight is lost (1)

Synthaxx (1138473) | more than 4 years ago | (#31253302)

My Atari ST disks are all archived on my raid5 with backups.

And guess which ones made the cut, it's not the ones with codewheels and junky color swatches. It's the ones with the catchy introtunes and the scroller texts that read "elite" or "automation".
The guys liberating these media are gonna be remebered for a damn long time.

Re:The fight is lost (1)

Kenshin (43036) | more than 4 years ago | (#31253382)

Not much of note. A few old websites I made, a few 3D renders (can't find the models, though.) I didn't produce much worth holding onto back then.

But pretty much all my stuff from 10~12 years back until now stays on my hard disk, and moves from new disk to new disk as I upgrade. All of my music and photos are managed by library apps, and I have automatic backups at least weekly. (Backing up is more convenient now, since I recently moved to a Mac and have Time Machine set to do it when I plug in my external drive.)

Re:The fight is lost (1)

jtownatpunk.net (245670) | more than 4 years ago | (#31253524)

/raises hand.

I still have the "utils" directory from my '286 even though the programs have been obsolete for a 'coon's age. (A 'coon in captivity, that is.) Every company I've worked at has had a "data refresh" plan of some sort where we move old archived data to new media. And only one of those companies ever mined that data for a useful purpose. The rest kept it "just in case".

I have a feeling the problem isn't going to be that we retain very little important information but that, of the vast mountain of crap we retain, a shocking small percentage will have any real value. 2 tons of chaff for every grain of wheat.

sadly lawyers are working tirelessly (2, Insightful)

Anonymous Coward | more than 4 years ago | (#31252880)

to ensure this never happens. This is the same reason why DVD's and Bluerays will never work in 100 years time.

DRM will destroy any record of our current culture, but looking around at the abyss, I really have to say its for the better.

But I already feel bad for the eventual people that will spend far too much time trying to recover "scary movie part 15" or some other 'gem' from our time. But much like 'abandonware' and other areas of trying preserve machine code, lawyers will always race in to make sure all copies are lost forever.

Support things like SIMH while you still can!

The Middle Ages didn't have the DMCA (4, Insightful)

MagikSlinger (259969) | more than 4 years ago | (#31252900)

The main way ancient writing reached us is because someone copied it. Lots of copies. Sometimes translated into another language and back, for example, a lot of Greek learning went into Arabic and came back out into Latin or Greek. With all the copy protection and encryption on our media today, can we ever copy the data and be able to decipher it again?

Re:The Middle Ages didn't have the DMCA (1)

a whoabot (706122) | more than 4 years ago | (#31253192)

"a lot of Greek learning went into Arabic and came back out into Latin or Greek."

Can you give me an example of a significant text? I'm pretty sure it is a myth that lots of Greek learning has gone through this process. I see the claim made a lot, but I've never come across a text which has done this (the philosophers, the dramatists, the historians, the lyric poets, all seem to come from the original Greek).

Re:The Middle Ages didn't have the DMCA (0)

Anonymous Coward | more than 4 years ago | (#31253336)

How about the Rosetta stone? That's a pretty significant use of multiple languages to extract meaning from an even older one.

Re:The Middle Ages didn't have the DMCA (0)

Anonymous Coward | more than 4 years ago | (#31253486)

Ah, and therein lies your problem, artificial selection by genre. A lot of the science and mathematics work was lost except in Arabic. Hero of Alexandria, for example, had to be recovered. The monasteries weren't interested in that kind of stuff, you see.

Re:The Middle Ages didn't have the DMCA (1)

lennier (44736) | more than 4 years ago | (#31253488)

I don't know about Greek, but it's a good thing that at least the authentication server for the Epic of Gilgamesh is still online.

Re:The Middle Ages didn't have the DMCA (2, Informative)

reverseengineer (580922) | more than 4 years ago | (#31253640)

Probably the most significant texts to undergo this process were Ptolemy's Almagest and Euclid's Elements; both had been lost to Western Europe, and were thus translated in the Middle Ages to Latin from Arabic by Gerard of Cremona and Adelard of Bath, respectively. I believe in both cases the original Greek texts were eventually recovered by the West used for later direct translations, but for a while Western Europe knew Hipparkhos/Hipparchus as "Abrachir."

Re:The Middle Ages didn't have the DMCA (3, Informative)

Ltap (1572175) | more than 4 years ago | (#31253654)

The problem is that very few identifiably Greek writings survive. In ancient times, copying was a bit like playing telephone - writing at the time was very politicized, so scribes would often alter works while copying them, mostly to give a local slant or simply changing the names. This makes it frustrating to trace things like legends (see: Noah's Ark/Epic of Gilgamesh and its infinite variations with every other culture that existed nearby). A lot of Greek and Roman writings are now quite simply lost for good, but almost certainly inspired works that aren't lost. For instance, the Odyssey and the Iliad were originally just two parts of the epic story of Troy (out of, AFAIK, four or five parts in total), and the set of works that we derive most of our knowledge of Rome from, Ab Urbe Conditum, are only partially preserved - it was a set that chronicled the history of Rome from its founding to when they volumes stopped being produced, and there were hundreds, enough to fill entire libraries. It was only in the Renaissance that anyone tried to assemble a collection, and we've only been able to come up with about 30 - if we had the full set, we would know a great deal more about Rome than we do now.

Re:The Middle Ages didn't have the DMCA (1)

MagikSlinger (259969) | more than 4 years ago | (#31253824)

I see other people have responded, but a lot of the mathematical texts came that way. For example, Euclid's Elements was the most famous of these. There were a lot of books on the geometry and mathematical knowledge of that age, as well as most of the ancient astronomy. If I still had my text book from the "History of Mathematics" class I took, I could give you specific names and titles.

Almost all of today's surviving texts of Archimedes came via Arabic: http://en.wikipedia.org/wiki/Archimedes#Writings [wikipedia.org]

From what I gather, we are lucky to have what we do. A lot more of that preserved Ancient learning was lost when the library of Baghdad [wikipedia.org] was sacked and burned [wikipedia.org] .

Re:The Middle Ages didn't have the DMCA (0)

Anonymous Coward | more than 4 years ago | (#31253462)

With all the copy protection and encryption on our media today, can we ever copy the data and be able to decipher it again?

Luckily, the strength of most DRM is shit. If it's so easy to crack today, imagine what the space-historians of tomorrow will be able to do with their positronic computers and whatnot.

Re:The Middle Ages didn't have the DMCA (0)

Anonymous Coward | more than 4 years ago | (#31253888)

The DMCA does have archival exemptions and it doesn't apply to everything. We don't know what the popular songs were in the 1600s that street musicians played but we still know a remarkable amount and there are family histories and such that fill in a lot of those gaps. Plus the data that is covered by the DMCA is usually owned by an entity that has a vested interest in making it playable to paying customers. It's funny, you can get blu-rays with old movies on them but you might have a really hard time playing super8 films your grandfather made of your mom or dad 40 years ago.

It's startling how quickly things can change in technology, my wife and I have a VHS recorder somewhere in the basement or garage but we've not used it this century... The media for storing data, the formats the data is in, those are some hard problems on the generational time scale. I just did an exercise a couple months ago, copying some CDRs I made in 1993 (from data what was on floppies before that, probably spanning back to the early 1983 or so) on to newer media. For reasons I don't know, I used 6 or 7 different DOS based archivers to save space, some of which are completely dead and gone (but the compressed really well in 1993...) I scoured the web to find copies, then I had to figure out how to make some DOS software work, then I had to figure out how to actually get the data in to my DOS virtual machine (no networking in DOS and I have long since forgotten how to make a CD-ROM work in DOS...) and one of those programs happened to do enough exotic crap that it just wouldn't run in Win2000... This was only 17 years and fortunately I'm the one that assembled the data and I gradually started to remember all the stuff as I recovered it, someone else might not have figured it all out. The CDR will worked fine though.

Even with opensource stuff there is a half life, I had xiafs based Linux disks at that time, that's long been yanked out of the kernel. GIFs might not be readable in 15 years, it's completely possible. You might laugh but take ZIP, it has to be one of the most ubiquitous formats for data out there, it won't work with files over 4GB in size and files that big are becoming remarkably common... So xar or tar or something else starts to generally replace ZIP. Fast forward 10 more years and you could have some challenges compiling unzip and then using it on your files.

I think the best thing is to have a policy of sorts of going back every 5 years and keeping your data living. The media grows so it's not hard to store it, that's cheap, but the formats and everything need to be updated along the way.

Practical example : Classics emulation (3, Insightful)

DrYak (748999) | more than 4 years ago | (#31253898)

The main way ancient writing reached us is because someone copied it. Lots of copies. {...} With all the copy protection and encryption on our media today, can we ever copy the data and be able to decipher it again?

(And as another example of copies being important for preserving : Fritz Lang's Metropolis [wikipedia.org] got recently another 30 minutes of its missing part recovered from a copy located in Argentina)

After a long enough time, virtually any DRM measure end-up being broken. What only matters is time, resources and some clever tricks (to avoid waiting until universe heat-death while bruteforcing a 4096bit key).
So DRM has only 2 direct effects :
- it annoys legitimate users everywhere with no practical reason.
- it forces the basement-dwelling teen with too much free time on their hand to wait until 2 weeks before official launch date, instead of 3 weeks before, because it took 1 week to the pirates to find a way to break the DRM.

This implies 2 results :
- That the 99.99% of pirate users, will never ever interact with the DRM nor be affected by it in any way.
- The important part : DRM protected piece of data will get copied, eventually and a lot. Lots of copies will exist and virtually 99.99% of these copies will be the "pirated" copies. Be it legal backup or unlicensed copies.

So in the end, the DRM-protected data will survive, only not the DRM version itself, but the DRM-free version as found on The Pirate Bay and similar. Case in point : Classics emulation.
Most of the companies which produced the game we played as children are now belly. Of the few remaining, few of them have kept the assets of their old production. Few of them are interested in doing anything with these old assets. The few who do, generally do modern re-imaging and re-interpretation, rather than re-issuing the old.

So in short, if you ever wanted to pull back some of your children memories out of the grave, don't count on the original companies.
Some time you can find still working vintage equipment and media - but these will eventually break.
Today, the biggest part of these oldies are available ... as image of pirated disks. It's practically sure that, if in 2010 you want to play the same game as in 1985, you'd probably see a cracktro in the beginning.

All your Commodore C64, Amiga, etc. favourite games are currently best sourced from download site which contain warez copies that were carried over back from that era, while at the same time the companies went belly up and/or let their assets rot.

So, in 25 years, when most of the current media companies have either disappeared, or completely forgotten about today's media, your children's best way to find a copy of them to remember fond memories, would be finding a copy which will be the digital descendant of what's today on pirate bay.
Yes, **AA, today's EVIL pirate, might be tomorrow's heroic archivist.

In 25 years, when the current maker of

Quick... (3, Funny)

eegad (588763) | more than 4 years ago | (#31252910)

Everybody print out all their emails!!!

Re:Quick... (4, Funny)

biryokumaru (822262) | more than 4 years ago | (#31253100)

Oh god, why doesn't Gmail have a print all function!?

Does anyone learn from history anymore? (1)

mrbene (1380531) | more than 4 years ago | (#31252912)

Those that forget history are doomed to repeat it - but these days, it seems that there's more and more effort put into actively avoiding learning from history.

Or maybe I've just hit that age when The Kids ought to get off My Lawn.

Re:Does anyone learn from history anymore? (1)

N!k0N (883435) | more than 4 years ago | (#31252984)

guess I'm there too... except i still live in a flat, and have no lawn for the damn kids to get off of...

Re:Does anyone learn from history anymore? (1)

MrEricSir (398214) | more than 4 years ago | (#31252996)

History is like software, it needs maintainers or it's doomed to disappear in the next version.

Re:Does anyone learn from history anymore? (1)

iggymanz (596061) | more than 4 years ago | (#31253246)

you mean "needs rewriters and revisionists."

"History" is written by the winners to appease their benefactors, as they say.....

Perhaps the way we think.. (2, Interesting)

malkavian (9512) | more than 4 years ago | (#31252966)

About storing data will change. Historically, we've stored on paper, stone, or whatever could be inscribed. The 'backups' for data has been more about attempting to 'inscribe' media with the digital info.
Perhaps we're entering an era where we'll be trying to keep information 'live' perpetually, with the internet the first attempt at having an active library (though there are currently lots of cracks for information to be lost).

Many of the laws that overly stymie information flow (DMCA etc.), I think, are just a knee jerk reaction in the way printing presses were suppressed, and controlled until everyone realised the benefits of having them opened up.

Still, having the long term offline stores is no bad thing..

perfect example: Geocities (4, Insightful)

Eravnrekaree (467752) | more than 4 years ago | (#31252976)

It is indeed a big problem. The problem was illustrated recently when Yahoo suddenly pulled the plug on Geocities, wiping out a vast cultural archive that went back to the early days of the internet, a lot of valuable information was lost as a result of that. Yahoo's blatant arrogance caused me to refuse to ever use any of their products again. Geocities was actually a fairly nice service, often people criticised it because of the ads, but how do you pay to continue to offer a free service. The loss of geocities was a perfect example of the need for a permenant store or online archive of information, personal websites and so on that can be maintained as a cultural legacy and informational resource.

Re:perfect example: Geocities (1)

biryokumaru (822262) | more than 4 years ago | (#31253124)

If only there were such a thing [archive.org] ...

Re:perfect example: Geocities (2, Insightful)

Eravnrekaree (467752) | more than 4 years ago | (#31253278)

I checked archive.org backups of geocities. half of the sites are not backed up correctly. Mine was never backed up, it seems, at all. With most sites 90% of the files are missing. Is archive.org the solution? Apparently not.

Re:perfect example: Geocities (2, Interesting)

Ltap (1572175) | more than 4 years ago | (#31253672)

Archive.org is the solution, and this is just one of those problems where throwing money at it actually works - give them more bandwidth, more contributors, and more disk space, and they could work wonders.

Re:perfect example: Geocities (0)

Anonymous Coward | more than 4 years ago | (#31253554)

And someday they'll disappear too, taking their entire archive with them.

The point isn't that someone is backing up the Web sites, the point is that there aren't a bunch of copies of Web sites spread around everywhere. Sure, the Internet Archive would probably try to transfer their archives to some other organization if they folded, but it's still a single point of failure.

In short, who archives the archivers?

Re:perfect example: Geocities (3, Interesting)

jaavaaguru (261551) | more than 4 years ago | (#31253164)

You mean like archive.org? I actually went there recently to look at old Geocities, and was shocked that they don't have it all backed up there. Archive.org has pretty much everything else I've looked for. Any idea why geocities is not there?

Re:perfect example: Geocities (1)

Eravnrekaree (467752) | more than 4 years ago | (#31253320)

Interesting that you mention that. I havea geocities site so i knew of the Archive.org thing, that was supposed to back up the site. I checked archive.org backups of geocities. half of the sites are not backed up correctly. Mine was never backed up, it seems, at all. With most sites 90% of the files are missing. Is archive.org the solution? Apparently not.

Re:perfect example: Geocities (2, Insightful)

Ltap (1572175) | more than 4 years ago | (#31253712)

Part of the problem is of manpower - geocities was just so massive, and Yahoo gave them very little time to archive anything properly, so most of it was simply a dash to copy as much as they could before it was deleted. When you look at public domain audio, video, and texts, you'll see that things have been done much better.

Re:perfect example: Geocities (4, Informative)

lennier (44736) | more than 4 years ago | (#31253562)

Perhaps because others were doing it. A number of independent projects tried to back up Geocities, and may have between them recovered most of the data.

* http://geociti.es/ [geociti.es]
* http://reocities.com/ [reocities.com]
* http://www.archiveteam.org/ [archiveteam.org]

Re:perfect example: Geocities (0)

Anonymous Coward | more than 4 years ago | (#31253578)

Most likely because of the bandwidth limits. When I tried wgetting my Geocities page it failed about halfway through because I had gone over my allowed bandwidth for the time period(I had quite a few pictures on there).

ffs.. the "zomg how to preserve" story -again-!? (3, Insightful)

Animaether (411575) | more than 4 years ago | (#31252978)

Seriously, Slashdot.. until there's a revolutionary insight into this matter.. quick posting these stories ad nauseum.

For further commentary, see previous stories... here's one.. it's from september 2009 and -nothing has changed-.

http://ask.slashdot.org/story/09/09/29/1646251/Archiving-Digital-Artwork-For-Museum-Purchase [slashdot.org]

Re:ffs.. the "zomg how to preserve" story -again-! (1)

iggymanz (596061) | more than 4 years ago | (#31253198)

especially that the main insight is that 99% of digital records are useless crap. Just like it won't matter if archaeologists never find 99.9999999% of our cities, when you've seen one Starbucks next to a McDonalds next to a Walmart, you've more than seen them all. The ditto mark will be the most used character recording our drivel... don't even get me started on our mostly devoid of talent "music" and "art" (is a frontal lobotomy prerequisite to being a rap star?)

Uncompressed (2)

Singularity42 (1658297) | more than 4 years ago | (#31252990)

You'll have to go to .wav (not FLAC)--just straight bits. This does away with both copy-protection and compression.

One Site to Archive Them All (3, Funny)

enoz (1181117) | more than 4 years ago | (#31253018)

http://archive.org/ [archive.org]

They've already got a copy of your Geocities sites from the first Digital Dark Age.

Re: One Site to Archive Them All (3, Funny)

indeciso (1350357) | more than 4 years ago | (#31253314)

...One Site to find them, One Site to bring them all and in the Darkness bind them...

To forget is good (3, Insightful)

Anonymous Coward | more than 4 years ago | (#31253026)

IMHO we'll find that our problem is that we drown in a sea of useless information because we can't find the islands of relevance. Trying to archive everything will only lead to failing to archive anything. On the other hand I doubt that we'll lose much important information despite failing at organized preservation attempts, because important information is copied all the time, which is the only way for information to survive quickly changing technologies and file formats anyway.

In a more philosophical light, I think that forgetting is good for us. It frees us from the constraints of our past and makes way for new ideas. Archives are backwards-facing, but we all live in the future, all the time.

Forecast: Cloudy forever (5, Insightful)

presidenteloco (659168) | more than 4 years ago | (#31253030)

I think that many people are failing to appreciate the longevity of information preservation
that cloud computing (more specifically, redundant, geographically distributed network storage) can bring.

If we get the protocols right, and insist on open standards for data interchange, we can obtain
properties such as:

Data bundles that know how to move themselves to more recently commissioned, and/or more
reliable hosts.

Data bundles that know how to check in with copies of themselves, to make sure there are enough of
them alive, and that they are adequately geographically distributed, at every given moment.
If not, then more baby copies of the same data would be produced and stored elsewhere automatically.

There are other issues to longevity of course, like maintenance of software that understands different
versions of data etc. Not trivial but very doable.

How long an individual disk or SSD or stone tablet lasts is COMPLETELY IRRELEVANT to
the prospects for information longevity, given the network, and new levels of automated distribution
that will take place on it going forward.

Re:Forecast: Cloudy forever (1)

value_added (719364) | more than 4 years ago | (#31253348)

How long an individual disk or SSD or stone tablet lasts is COMPLETELY IRRELEVANT to the prospects for information longevity, given the network, and new levels of automated distribution that will take place on it going forward.

I don't know that I agree with that.

Compare, for example, letters written during the Civil War [dsu.edu] , with email messages sent and received by those involved in either of the Gulf Wars. Which do you think had, at the time they were written, a better chance of being available to future historians?

I'd suggest we are in danger of losing our history. What's odd is how blithe we are about it.

Re:Forecast: Cloudy forever (1)

Ltap (1572175) | more than 4 years ago | (#31253804)

It depends on how the senders and receivers think about the information. I know people who kept every postcard and letter they'd ever received - I doubt you could say the same about that with email. People still just don't consider email a serious medium.

Re:Forecast: Cloudy forever (1)

lgw (121541) | more than 4 years ago | (#31253536)

Information "in the cloud" will disappear the moment you stop paying for it. Corporate information in the cloud will come with a destruction date (as do paper corporate records in a storage facility, so no real difference there).

Open software, open standards (1)

gwern (1017754) | more than 4 years ago | (#31253032)

"Movable Type is a dead end. In the long run, the utility of all non-Free software approaches zero. All non-Free software is a dead end."

http://diveintomark.org/archives/2004/05/14/freedom-0 [diveintomark.org]

PLASTIC PUNCH CARDS (1)

Marxist Hacker 42 (638312) | more than 4 years ago | (#31253058)

Far outlasts stone, and if you did it right I'll bet you could get nearly 1Mbit per card without running into the problems of Lace Cards [wikipedia.org]

Re:PLASTIC PUNCH CARDS (2, Insightful)

lgw (121541) | more than 4 years ago | (#31253572)

The army had a program to design a means of storing data in case of really being nuked back into the stone age. They chose punched metal tape. Most plastic doesn't last long whn exposed to sunlight or weather, and the downside of a card deck is obvious the first time you drop one down the stairs. It's a clevel idea really, since you can read punch tape manually if you have to, and it's far faster than cards.

Great! Now I'll have to buy the White Album again (3, Insightful)

BluBrick (1924) | more than 4 years ago | (#31253082)

We will naturally make multiple copies of everything we consider important, continually transcribing important data onto the latest generation data storage media. (Consider what was the very first publication printed on Gutenberg's big invention.) Unfortunately, that's not necessarily what will be considered important many generations into the future.

I have every confidence that, far into the future, we will have or be able to develop the capability to read any media we preserve today. The problem then becomes how to determine what data we should should preserve now rather than how to preserve it. What do we know now that will be important and useful to someone 10^n years from today?

Not so hard (2, Funny)

T Murphy (1054674) | more than 4 years ago | (#31253130)

Just put a massive data server in a spaceship and accelerate it near the speed of light. Data loss would be slowed enough that it would be negligible, and if we have to retrieve anything it should have a fast enough processor to respond to a request in a timely fashion and send off a pre-made copy of the needed data (as it may take too long to copy petabytes at near light speed).

This should work out perfectly- by the time we have the technology to do this, today's worthwhile material should finally be coming out of copyright.

924 Years and nothing has changed (5, Interesting)

rudy_wayne (414635) | more than 4 years ago | (#31253232)

The Domesday Book was commisioned in December 1085 by King William (aka William the Conqueror, who invaded ngland in 1066). The first draft was completed in August 1086 and contained records for 13,418 settlements in the English counties south of the rivers Ribble and Tees (the border with Scotland at the time). It is a detailed statement of lands held by he king and by his tenants and of the resources that went with those lands. It records which manors rightfully belonged to which estates, thus ending years of confusion resulting from the gradual and sometimes violent dispossession of the Anglo-Saxons by their Norman conquerors.

In 1986, at a cost of £2.5 million, the UK compiled the contents of the Domesday Book into electronic form that was stored on laserdiscs. The information stored on the laserdiscs, which is the equivalent of several sets of encyclopedias, is now unreadable because the equipment needed to read the discs is no longer available. Meanwhile the original book is still readable after more than 900 years.

Re:924 Years and nothing has changed (0)

Anonymous Coward | more than 4 years ago | (#31253364)

Really? The last functioning laserdisc reader I saw was in the late 90s... around the time when DVDs were becoming available. There is really no excuse for not transferring this data to DVD. Are you certain there are no functioning laserdisc readers? I have working floppy drives and other media from the mid 80s (although the plastic on the connectors is pretty fragile now).

Re:924 Years and nothing has changed (2, Informative)

tomhudson (43916) | more than 4 years ago | (#31253444)

We can scan in the surface pits of the laser disk at high-enough resolution to decrypt the bit patterns - we no longer need the original readers.

Re:924 Years and nothing has changed (1)

Lucidus (681639) | more than 4 years ago | (#31253732)

Except that laser discs are analog!

Lots of other things to consider (4, Interesting)

syousef (465911) | more than 4 years ago | (#31253282)

In my own quest to preserve my digital photos, I've created multiple backups on hard disk including a remote backup which gets updated every few months. I use different disks created by different manufacturers and buy new disks every couple of years (but do not throw away old copies).

I've recently come across another aspect that isn't addressed by the article. Data that is in use in an online copy can be modified (including corrupted).There is no point in copying/propagating data if the data you are copying is damaged. Typically this has happened when I've tried DAM software like Lightroom which will modify the original file despite claiming to be non-destructive I have no proof that photos were re-encoded or quality was reduced but I do know original files were altered, and I want an original unaltered file preserved

Most people when they backup files do very little verification to ensure the files they are copying today are the same files that were created 5 or 10 years ago. They rely too much on backup software to do this for them, with no attention paid to what's happened to the data between copies. To keep this under control I've started putting checksums on all my photo files, which I check when I create a fresh copy.

Of course where my photos are captured in a proprietary format I copy to an open or at least well documented format (typically jpg, sometimes also tif). This is done as soon as I transfer the photos, which are not removed from the camera card until i have 2 additional copies. So I shouldn't have the same issues that the author had assuming jpg can still be read throughout my lifetime.

--
Sammy

Re:Lots of other things to consider (0)

Anonymous Coward | more than 4 years ago | (#31253368)

Tip for maintaining files unaltered: Use disk image files and mount them read-only. Create checksum files for the files in the image, then create a checksum file for the whole image and make copies. If you also store your data on CDs or DVDs, use ISO images instead.

Re:Lots of other things to consider (1)

syousef (465911) | more than 4 years ago | (#31253612)

Tip for maintaining files unaltered: Use disk image files and mount them read-only.

Try doing this on Windows XP for a USB hard disk. Official method is to make all USB drives read only. There are clunky kludges out there for getting past this but nothing usable long term.

Also I do make edits of some files, so I need a "working copy", and once I've made these edits I have to keep them. Checksums on the working copy on individual files seem to be the best way to go.

Re:Lots of other things to consider (1)

rubycodez (864176) | more than 4 years ago | (#31253534)

bad choice! jpeg is lossy format, information is deliberately dropped to make an approximate reproduction!

you're like the guy in the India Jones movie who drinks from a fancy chalice, has the flesh and guts dissolve and burn from his bones: "...he chose....poorly....."

really, if you value your work do a little research, maybe standard such as "TIFF Revision 6.0 Final" or similar should be used, and perhaps with widely known and well documented lossless compression.

Please don't troll (1)

syousef (465911) | more than 4 years ago | (#31253662)

bad choice! jpeg is lossy format, information is deliberately dropped to make an approximate reproduction!

Many cameras only capture in a lossy format such as jpg. Even those that have RAW sometimes use lossy RAW. Losses only occur once per save. So to mitigate you don't modify files repeatedly. If there is a need to do this, go back to the original, save to TIFF and edit from there. So long as you have the original preserved you can always reapply any edits.

you're like the guy in the India Jones movie who drinks from a fancy chalice, has the flesh and guts dissolve and burn from his bones: "...he chose....poorly....."

I can't tell if you're trolling or just being melodramatic.

really, if you value your work do a little research, maybe standard such as "TIFF Revision 6.0 Final" or similar should be used, and perhaps with widely known and well documented lossless compression.

TIFFs are a poor choice unless multiple edits are going to be made. They slow down current hardware immensely and aren't widely supported. Try flipping through 12MP tiffs using Windows fax and picture viewer on XP.

I think you ought to have some idea what you're on about before you criticise so floridly.

Re:Please don't troll (1)

rubycodez (864176) | more than 4 years ago | (#31253810)

was being melodramatic for fun,but I've had photos butchered by subsequent moving from one jpeg software to another.

Yes, Windows XP will choke on your little ~52 megabyte TIFF. Real operating systems won't, Mac OSX with sufficient RAM and good software and there's no problem at all. Linux does ok too, though available free software not as high quality.

Self-correcting problem (3, Insightful)

drDugan (219551) | more than 4 years ago | (#31253288)

we are generating data far, far faster than we can save. We have for some time, and while trends for storage are catching up, we will always be able to generate more than we store, as a function of how computing and communications work.

So what to save? The Director of the NLM had a unique insight on this exact question: [paraphrasing] "What is used, is saved." Basically, its the utility of information, that information that people find useful and actually use is the best proxy for long term value. The good thing is that all people are motivated to store and maintain the data they find useful, or their constituents or customers desire. As long as people keep wanting data, it will be stored and available.

This is a very different situation to real-world archeology. In the digital, connected world we can access data today once it's publicly available, evaluate it and use it if we want. There is no dust that covers old data, it does not get buried...

Silicon Is the New Stone Tablet (1)

rebelscience (1717928) | more than 4 years ago | (#31253304)

Forget CDs, DVDs, magnetic media, etc. All data should be stored in solid state devices. Google knows.

Loss of all technical knowledge (1)

Punctuated_Equilibri (738253) | more than 4 years ago | (#31253312)

The bigger danger is there is some major event like a plague and not enough people are left to maintain a technical society. Who would know how to make a microprocessor, or even refine gasoline? Tan leather? Grow crops? We could be back to the stone age in a single generation.

It's more than just web pages. (1)

OrionSeven (1312747) | more than 4 years ago | (#31253354)

Archives.org isn't enough. The digital problem is more than just web sites. Every state records legal files with various methods, platforms and formats. Tiff's, JPEG's, WAV's, PDF's and more are the real heart of this issue along with web sites.

The Washington State Digital Archives (http://digitalarchives.wa.gov/) is already taking on this issue and has been for five years. Hopefully, more than just a few state will get serious about this issue.

We may have already lost files from 15 years ago, but that doesn't mean we have to loose files from this year.

(Full disclosure, I work for there.)

Burn them all (0, Flamebait)

gmuslera (3436) | more than 4 years ago | (#31253366)

At how much Farenheit the digital records combust? Wont be so dark that ages, at least while the fire last.

A century later, we will still will find buried snapshots of wikipedia on devices like WikiReader [thewikireader.com] .With paper books making copies is expensive,to one kind of device usually, and takes a lot of space. Digital records,in the other hand,could be put in a lot of ways, but what must be preserved is how to decode or interpret it (using open formats for it could help a bit there).

Cranberry DiamonDisc: 1000 years (1)

Bleek II (878455) | more than 4 years ago | (#31253378)

The Cranberry DiamonDisc is a 1000 year option already on the market. They aren't cheap but they should come down in price if they're able to get a enough costumers to bring the supply up.

Um? (1)

drDugan (219551) | more than 4 years ago | (#31253414)

This is what counts for science nowadays?

http://www.americanscientist.org/include/popup_fullImage.aspx?key=vo50G9YwnF6SwlOk2usL5R9EyqRLsNX+YiPzweX/0ZsH0IeSOOXIBip7qwN2/ZRY [americanscientist.org]

Look carefully at the 'digital encoding' of the "simple tone" sine wave. ??? Really? What encoder is that?
cf. http://en.wikipedia.org/wiki/Fourier_Transform [wikipedia.org]

Re:Um? (1)

imsabbel (611519) | more than 4 years ago | (#31253594)

Glasshouse, meet stone.

Or in other words: Are you functionally retarted?

Look up the meaning of "digital". Hint: it has nothing to do with all the stuff happening if you _compress_ digital signals.

Best way to avoid a dark age... (0)

Anonymous Coward | more than 4 years ago | (#31253448)

Best way to avoid a dark age: prevent a fundamentalist cult from exercising its fantasies of domination / cultural supression.

Includes: Christianity, Islam, Scientific Materialism, among others (these happen to be the most successful to date).

File formats vs physical media (1)

Chuq (8564) | more than 4 years ago | (#31253458)

This topic involves two vastly different things:

- File formats - easy - just make sure everything is stored in an open format, or something so ubiquitous its as good as an open format (odt, txt, jpg, pdf, csv, ogg, etc) and it will be readable forever.

- Physical media - this is the risk - most new machines these days can't read 3 1/2" floppies, let alone anything older, but so long as you migrate contents of your old physical media onto new media formats - AND you have multiple copies of important stuff - that shouldn't be a problem.

Terrorism is our savior! (1)

Jah-Wren Ryel (80510) | more than 4 years ago | (#31253532)

With all the pushing by law enforcement for permanent archiving of everybody's web use the problem will solve itself!

Rah! Rah! for terrosists - they hate our freedom but they have saved our culture from fading from history! [livingwithanerd.com]

Ehhh... not so much. (1)

CODiNE (27417) | more than 4 years ago | (#31253624)

Virtual machines really eliminate a lot of those concerns. But what we really have to worry about is silent bit rot. I've found a few old files of mine that are corrupted. Not cool. ZFS and drobos... I don't really see a good end-user ready backup system that verifies data integrity.

Old Sierra Games (1)

AnotherAnonymousUser (972204) | more than 4 years ago | (#31253728)

Anyone who remembers some of the old Sierra games knew how they occasionally had glitches or bugs that would cause the game to crash completely. Though Sierra would sometimes drag their asses about support, at one point or another the fans would patch the games themselves and publish the unofficial patch to the web. Quest For Glory IV was a perfect example of this; a fantastic game, but it had a few very memorable glitches that would cause the game to crash. The problem was that these fan patches were most often hosted on Geocities or Angelfire sites, most of which have since vanished from the world, leaving the games unplayable, unable to be completed, and unfixed hosts to whatever problems plagued them at the time of their publishing. Something that would have been a "popular" fix even just a few years ago is now irretrievable; forums point to dead links, users have long-since abandoned the posts, and the files themselves are nowhere to be found because of the death of the hosts. Obviously it's minor in the grand scheme of the history of the nation, but it *is* a good indicator of just how much can be culled within a very short period of time.

Anal Grannies, Weapons of Mass Destruction... (1)

DVD9 (1751726) | more than 4 years ago | (#31253756)

Gonzo, War on Terror, First Person Shooter...

It would indeed be a tragedy for civilization if such data were lost and the mind of the early 21st century American went unrecorded.

VERY interesting topic (1)

adosch (1397357) | more than 4 years ago | (#31253768)

This is something that I've seriously taken a look into on the personal side of things. I look at all the digital data I've collected (and lost due to a drive failure, virus, corruption, disaster, ect.) over the years and it really makes your head go foggy. I only hit this realization putting together a wedding anniversary party for my parent's together in the last few months. My parents brought over bucket loads of photos and keepsakes that I have to rummage through for an overhead slideshow. On top of them being (thankfully) highly organized with their personal keepsakes, it far superseeds what I have for my own family. My wife and I went back and we literally have a 'digital divide' in the last decade for any tangible photos. Most of our memorable moments were done with a digital camera, which is great, but we have SO many dribblets of photos here and there on this burnt CD or that external storage device, ALL of which can get lost much easier, broken or misplaced FOREVER than a big ass, heavy rubbermade toat of pictures my parents have (negative included, I might add).

So I ask myself, what if my copy of my copy of my copy is corrupt? I'm screwed. What if I have something in an unsupported format that I can't find any support for? I'm screwed. What if I have a photo at 320x240 resolution and I want to make a 8x10 photo of it and put it on my wall? I'm screwed. We've successfully stove-piped ourselves for a high rate of non-reproduction of our valued items along with a staggering rate for failure on the mediums we've chosen for them.

I've come to the conclusion that tangible is becoming an obsolete word when it comes to anything I like anymore: music, movies, photography, books, news, conversations, ect. I don't think there is a way getting around it that I can see.

They will not care (1)

kikito (971480) | more than 4 years ago | (#31253806)

The future generations will just not care about us. Just stop thinking about this already.

Don't worry (1)

GetTragic (21640) | more than 4 years ago | (#31253840)

I've committed the internet to memory

Consumer demand will fix this. (1)

Seor Jojoba (519752) | more than 4 years ago | (#31253844)

I'm not worried. We are pretty soon going to have a bunch of people that are heartbroken about their data from 10 years ago being lost. The travel photos, the e-mailed love letters, the brilliant blog posts. And these people will create demand for longer-term storage and data collection techniques we don't have now. Why should it happen in the near future if it hasn't already? Because we first needed a generation of people that use computers and the internet as the primary way of expressing their life. Nobody was in that boat ten years ago. Now anybody reading this is. So consumer-grade "lifetime" storage options will enjoy a more prominent place on the market. And if you can get some old data to stick around for a half century or more, the value of it bumps up to "time capsule" status. Which means somebody might think to archive your mess of media around the time you die. Maybe some younger cousin of yours will take care of it. Heck, funeral parlors might offer data archival as a service 20 years from now.

Don't trust cloud computing. (2, Insightful)

Lazarian (906722) | more than 4 years ago | (#31253864)

If you want to preserve your data, backup your data yourself, and keep it on its own storage medium. There seems to be a growing impetus where "cloud computing" and "thin clients" are envisioned to replace traditional architectures where data is stored and decoded by the individual who owns/created it. I'd rather store my data myself than ask permission to access it through the equivalent of a 1980's green screen dumb terminal from some corporation who's interests run contrary to mine.
Load More Comments
Slashdot Login

Need an Account?

Forgot your password?

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>