Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

To Purge Or Not To Purge Your Data

CmdrTaco posted more than 6 years ago | from the i-much-prefer-the-binging-part dept.

Security 190

Lucas123 writes "The average company pays from $1 million to $3 million per terabyte of data during legal e-discovery. The average employee generates 10GB of data per year at a cost of $5 per gigabyte to back it up — so a 5,000-worker company will pay out $1.25 million for five years of storage. So while you need to pay attention to retaining data for business and legal requirements, experts say you also need to be keeping less, according to a story on Computerworld. The problem is, most organizations hang on to more data than they need, for much longer than they should. 'Many people would prefer to throw technology at the problem than address it at a business level by making changes in policies and processes.'"

Sorry! There are no comments related to the filter you selected.

Easier to keep (5, Insightful)

Geoffrey.landis (926948) | more than 6 years ago | (#25054505)

The problem is that it's easier to just archive the cruft stuff than it is to go through it all and figure out what's worth keeping.

Re:Easier to keep (5, Insightful)

Daimanta (1140543) | more than 6 years ago | (#25054573)

True, proper archiving takes huge amounts of time since it adds overhead to your operation.

In an ideal world, everything that you store is automatically labeled and old data will automagically be purged. But storing all kinds of shit is just that much easier. It also doesn't help that data storage is so dirtcheap. 1TB can be bought for around $100 if I am not mistaken. It doesn't pay to kill old useless stuff you have floating on your hard disk.

Re:Easier to keep (4, Insightful)

Sobrique (543255) | more than 6 years ago | (#25054597)

Add to that legal requirements of retention - you'll need to filter your 'customer communications' from your 'shopping lists'. That's what actually makes this a nuisance - the possibility that there will be legal action in 5 years time, that you'll need to fight.

Yes, less data need to be kept, but first there needs to be a _massive_ re-education of the 'data packrat' culture that the users of it have.

Re:Easier to keep (3, Interesting)

BobMcD (601576) | more than 6 years ago | (#25055719)

you'll need to filter your 'customer communications' from your 'shopping lists'

Actually, I thought it was a fairly common legal tactic to make the data as difficult to actually find as possible, without revealing too much to the other side.

"They want records from three years ago? Send a truck with printouts of all the files we have, that'll keep them busy..."

Does anyone know that this is no longer the case?

Re:Easier to keep (5, Interesting)

cmause (903686) | more than 6 years ago | (#25056165)

There used to be a sort of gentlemen's agreement between attorneys to not dig in to electronically stored information (ESI). That was back when everything important ended up on paper anyway, which was discoverable.

As time went on, fewer things ended up on paper, but the rules of discovery didn't evolve. That was the time of backing up a U-Haul full of printed out copies of every file, e-mail, etc. that a company had. Now the opposition had to dig through mounds of trash in the hopes that they will find that one incriminating document.

Then attorneys got more savvy, and in the so-called Rule 26 (refers to the Federal Rules of Civil Procedure), the attorneys would agree on the format of ESI to be exchanged. In December, 2006, the Federal Rules of Civil Procedure changed to directly address ESI and electronic discovery.

Now, in litigation, parties may still get obnoxious amounts of data, but it's electronic. Once it's processed and converted (usually to TIFFs with extracted text, but sometimes PDF), attorneys can do what amounts to a Google search through the files and find what they want pretty quickly. In fact, paper documents are usually scanned and OCRed so they can be handled and searched in the same manner.

Actually, I thought it was a fairly common legal tactic to make the data as difficult to actually find as possible, without revealing too much to the other side.

"They want records from three years ago? Send a truck with printouts of all the files we have, that'll keep them busy..."

Does anyone know that this is no longer the case?

So no, it's no longer the case. But the first guy who did it must have thought he was pretty funny.

Re:Easier to keep (0)

Anonymous Coward | more than 6 years ago | (#25056163)

1TB can be bought for around $100 if I am not mistaken.

Whoa there! Yeah, you can probably score a 1 TB disk on sale for a $100 for your desktop but not for enterprise level stuff. Even if I can get a 1 TB disk for a server or storage device I still need to get another 2 or 3 for RAID and possibly an online spare.

Re:Easier to keep (1)

euri.ca (984408) | more than 6 years ago | (#25056323)

Yeah, I was a little skeptical of the line "at $5 a gigabyte" line.

Ignoring any cost savings in the future, if 1T=$100 (which is pretty close in USD) then they are planning on replicating their data 50 times, which is redundantly redundant.

(Not to mention that most of that 10Gb feature will be sending the same powerpoint presentation back and forth 100 times and will compress fantastically. Users aren't actually typing 2 billion words every year in emails.)

Re:Easier to keep (4, Insightful)

sunking2 (521698) | more than 6 years ago | (#25054679)

Cheaper to keep. Every hour I waste cleaning house costs more than it does to keep it stored. Storage continues to get cheaper, salaries typically don't. Sure, that $1.25M is a big scary number. But nothing compared to the salaries/benefits at a 5000 person company. Now you can argue the cost of data retrieval goes way up because chances are it'll take a hell of a lot longer to find, but that's a different argument altogether and you can just as easily question what the cost of not being able to recover something that was cleaned by accident is.

Re:Easier to keep (3, Interesting)

COMON$ (806135) | more than 6 years ago | (#25055099)

What I want to know is how these numbers are broken down. $5 per gigabyte to back up? Maybe if you factor in the cost of a robotic library. Considering that tapes currently run about $30 a pop for for 800GB and that I am on a 12 month rotation, I still don't come NEAR that price. 1.25 million for a 5000 person company? What kind of company? 10GB average is about 9GB over my average user here. Even when I worked at a larger company, we still weren't even breaching 700MB average INCLUDING e-mail.

Lovely scaremongering, but what did they mean by legal e-discovery? The time it takes to sort through the data or what?

The funny thing is it depends on your MTA (1, Interesting)

brunes69 (86786) | more than 6 years ago | (#25055503)

My 10GB mail box in outlook, when mirrored to my local hard drive in MBOX format, automagically becomes 2 GB - and that's before compression and attachment pruning.

I have no idea what the hell Outlook is doing on the server, if it is just storing things in multiple formats at once or if it is just mis-calculating all the space, but that is one hell of a difference.

Re:Easier to keep (2, Insightful)

TheRaven64 (641858) | more than 6 years ago | (#25055561)

The $5 presumably includes the physical media, the backup operator's time spent configuring the system, the hardware for performing the backup, and the safe, secure, off-site storage costs. 10GB per years is a lot more than I produce - my PhD was only 1.5GB in total, including temporary files (build cruft and so on), with only 210MB needed for the subversion repository (176MB after bzip2) - the bzip2'd repository of my book (including all text and code examples) is only 4.6MB. My mail folder is only 3GB, and that contains over ten years of email messages (and would compress very well).

On the other hand, I don't use Word, which manages to make single-page documents that are more or less plain text take up a few MBs. If you're in a company where everyone sends Word document attachments as emails instead of plain text (I've seen it done[1]) then you could probably generate 10-20MB of date per day from around 5KB of actual content, and backing this up might be cheaper than educating your users. Assuming some other work as well as emails this can easily get to 10GB.

[1] Even worse was my publisher, who sent me a scanned version of a contract as a Word document. A PNG of the same image was around 100KB, while the word document was 5MB and contained nothing other than the image. A lot of people just treat Word documents as a default container format for any content.

Re:Easier to keep (2, Informative)

guruevi (827432) | more than 6 years ago | (#25057131)

1) This is the average. Your company might have 700MB/user, in my organization, it's close to 1TB/user/year that gets added. We're doing medical imaging.

2) It's not just tape libraries. The cost for D2D2T or D2D2D (what we're doing) goes way up compared to a 'simple' backup scheme. Especially if you're like us and require mulitple gigabit streams, disk storage can't be just 4 cheap SATA disks in RAID5. We have 2 storage arrays with 14 drives each for general access and another storage array with 10 SATA disks for primary backup and those things don't come very cheap especially since you need multiple servers to handle the load.

3) Encryption, tape rotation or multiple locations add to the costs.

4) If you're buying a solution eg. from IBM (Tivoli), you need to pay for a consultant and/or another employee to get that stuff running. We're doing what we're doing with open source and it's going well, but if you can't and need to pay for software, it adds up (especially for Windows systems)

Yes--deleting costs money! (4, Insightful)

mkcmkc (197982) | more than 6 years ago | (#25055695)

I did a back-of-the-envelope calculation on just this question in 2004, and estimated that file deletion was not productive unless we could do it at a rate of at least 17MB per minute (of labor). Four years later the threshold is probably at least 45MB per minute.

Generally, this means that if we can blow away whole disks or huge directories of data, it may pay off. Users going through their files one by one is usually an absolute waste.

Re:Easier to keep (0)

Anonymous Coward | more than 6 years ago | (#25057067)

Cheaper to keep. Every hour I waste cleaning house costs more than it does to keep it stored.

The Messiest Home in the Country 2 [brightcove.com]

More videos at Clean House [mystyle.com] .

Re:Easier to keep (2, Insightful)

zappepcs (820751) | more than 6 years ago | (#25054869)

The problem is that it's easier to just archive the cruft stuff than it is to go through it all and figure out what's worth keeping or training staff to organize their data and retain only that which is necessary .

There, fixed that for you. Meta-tags and other efforts might change this in the future, but until there is a generalized understanding of things that should be archived and things that should not, and a better way to store, find, retrieve, and utilize company data, there will be tons of data saved that really should not be. Humans are like that.

Re:Easier to keep (2, Interesting)

Geoffrey.landis (926948) | more than 6 years ago | (#25055959)

The problem is that it's easier to just archive the cruft stuff than it is to go through it all and figure out what's worth keeping or training staff to organize their data and retain only that which is necessary .

There, fixed that for you.

According to the original article, ("The average employee generates 10GB of data per year at a cost of $5 per gigabyte to back it up ") the cost of backups is fifty dollars a year per employee.

So if that an average employee costs the company $100 per hour (including overhead), then if "training training staff to organize their data and retain only that which is necessary" takes more than half an hour per year, it's more cost effective to archive the junk than it is to train the employees to sort it.

Re:Easier to keep (5, Insightful)

daeg (828071) | more than 6 years ago | (#25055045)

The bigger problem is that you will fight different battles. If you're fighting a sales rep that sold your clients to a competitor, you want as much ammunition as possible. If a client is suing you for incorrect information relayed 8 years ago and you're probably guilty, you want as little information as possible.

Re:Easier to keep (3, Insightful)

vvaduva (859950) | more than 6 years ago | (#25055725)

Well, I did not RTFA in detail but it does not seem to address key regulations like HIPAA and SOX which put hard numbers on data retention. So whether or not it's expensive, you have to do it if you want to be legit. If the issue is discovery, a sound archival system will eliminate expenses related to discovery and would allow one to provide requested information very quickly and efficiently. I say let the legal people fight discovery requests and unless you have something to hide, stick with the requirements for archival and retention. The argument "the less you keep the less they ask for" is simply stupid. In certain SOX-related situations, even the appearance of impropriety will come back to bite you, so I always tell folks to do the right thing, by running your business properly, identifying document types correctly and sticking to regulatory requirement as much as possible.

My last job (2, Interesting)

dj245 (732906) | more than 6 years ago | (#25055067)

My last job had some files from the 1890's. The company had moved from New York to New Jersey to Houston in all that time. I can't imagine that material would ever need to be used, or would be called up during a legal investigation. Even if it were, would the authorities penalize a company for files that were that old??? At some point, everything is trashable or museum material.

This company occasionally needed blueprints from the 1930s/1940s (great lakes ships), but none of their ships went back much further than that.

Re:Easier to keep (1)

hesaigo999ca (786966) | more than 6 years ago | (#25055193)

I agree, I used to work for a company who was keeping all their documents for the past 7 years and they had a warehouse full, (importing and exporting) in paper, they wanted to digitize it all.
However they had the usual "archive it" attitude, well they had documents about everything from
everyone in double and triple....to say the least, they probably would never have needed to do this had they kept a better handle of organizing what was being kept, even if it was in paper format.

Re:Easier to keep (3, Interesting)

Chrisq (894406) | more than 6 years ago | (#25055511)

We went paperless, and when application forms, etc. arrive they are scanned and stored. Examination of the data shown that very often people would print out all the existing infromation on a customer and add it to the pile sent for scanning.

Result, look up a customer and you would find some files scanned half a dozen times.

Yes and... (1)

Xest (935314) | more than 6 years ago | (#25055323)

...whilst policies and procedures often solve a lot of things in a cleaner, more common sense manner there are unfortunately far too many people lacking common sense.

Throwing hardware at it guarantees it'll be done, expecting people to follow policies and prcoedures will likely leave you with a 50% success rate in ensuring the correct data is kept/binned and that's if you're lucky.

The world as a whole would be so much more efficient if we could get people to follow policies and procedures or at least the common sense, good practice ones.

Re:Easier to keep (1)

Lord Ender (156273) | more than 6 years ago | (#25055391)

Exactly. If it takes me two hours per week to sort through every bit of my data and decide what to pitch, that cost has to be compared to the archival cost to decide whether it is a worthwhile endeavor.

Of course, at my office, we just bought a server and a controller with 16 SATA ports, filled the sucker up with off-the-shelf 500GB disks, and built a 7TB RAID6 using Linux software RAID. The whole job only cost about $2k, and we no longer waste any time deciding what to delete and what to keep.

Re:Easier to keep (1)

PietjeJantje (917584) | more than 6 years ago | (#25056553)

I find it surprising that this issue is simplified to cost of storage. As others noted, who cares about the cost/employee for storage. What's much more important is the cost of information retrieval. I'd like to make a comparison with paper storage, because much research has been done there to cut costs. So putting aside physical storage costs, if you store all crap for a while, the storage just becomes a black hole where nothing can be found back. Storing crap is human nature, a "what if I need this document in two years?" even if the chance is very small. However, it turned out it's -much- more cost efficient to indiscriminately nuke stuff that wasn't labeled vital after not being referenced for three months (your mileage may vary). In those few cases where old documents were needed, the combined cost to reproduce them is much lower. In the meantime, you have a relative very clean and light information store where you can easily find things that are relevant to what you're doing and have recently been doing.

hmm (0)

solraith (1203394) | more than 6 years ago | (#25054519)

Seems to me that companies would keep all that data just in case a legal issue came up, in order to have a leg to stand on. Lawsuits are unpredictable that way.

Re:hmm (3, Insightful)

NoisySplatter (847631) | more than 6 years ago | (#25054751)

It's not so much that you want your company to have a leg to stand on, its that you don't want your legal opposition to get their foot in the door. Innocent until proven guilty remember?

Re:hmm (1)

Smidge204 (605297) | more than 6 years ago | (#25054829)

"Innocent until proven guilty" only applies in criminal cases. In civil cases - the kind a business is most likely to encounter - the exact opposite is typically true.

=Smidge=

Re:hmm (1)

NoisySplatter (847631) | more than 6 years ago | (#25055205)

It definitely still applies in civil cases unless the plaintiff already has overwhelming evidence to the contrary. However, since we're talking about pretrial discovery here in the case of backups, if the other party can't find the data to support their case they may just drop it.

But then again I'm not a lawyer.

Re:hmm (1)

Smidge207 (1278042) | more than 6 years ago | (#25055577)

Oh my evil doppelganger strikes 'gain. Ummm...stupid much?

=Smidge=

Re:hmm (5, Interesting)

MrMr (219533) | more than 6 years ago | (#25054801)

The top 500 company I worked for did just the opposite: Destroy all data in case a legal issue comes up.
They called it 'desk cleanout day', and unless you were an official dedicated contact on a particular subject you were to wipe all correspondence of more than a year old.
(There were also other grades of information, but erase after a year was the default).

Re:hmm (1)

Kjella (173770) | more than 6 years ago | (#25056051)

The top 500 company I worked for did just the opposite: Destroy all data in case a legal issue comes up. They called it 'desk cleanout day',

Enron?

Re:hmm (0)

Anonymous Coward | more than 6 years ago | (#25057103)

I worked for a year at AT&T's MIS department. The one that handles all the customer T1s to OCx lines for private use (ie, John Smith's Widget company wants a T3 to service their business).

AT&T had a rather strict policy regarding email (what you could and could not do, etc), they also had the exchange system automagically deleted after 60 days.
So every Monday you would come in and all the emails on the threshold would be moved a "Pending deletion" folder. If you didn't copy them out to the local PC by the next Monday, POOF.

Re:hmm (1)

FourthLaw (1365279) | more than 6 years ago | (#25054927)

Seems to me that companies would keep all that data just in case a legal issue came up, in order to have a leg to stand on. Lawsuits are unpredictable that way.

Unless they have an electronic data retention policy that states data will be kept for five years. At which case it must be purged.

Problem is, what if some manager sexually harassed an employee six years ago via email. Is it in the company's business interests for that data to be discovered?

On the other hand, if people are exceeding company policy and keeping a personal mail archive on their user volume, and that data can be demonstrated to have a history of over five years (the policy limit), then that company is in violation of their own policy. If that can be demonstrated, then they will be hammered legally for destroying evidence. In other words, they can no longer claim a policy for destroying records in five years.

So it is a two-edged sword. Better have a policy and better be sure that it is being followed.

"Let's keep it all" is not a solution (1)

PotatoSan (1350933) | more than 6 years ago | (#25055227)

Conversely, keeping all of that data also opens you up to legal trouble. Different types of records should be kept for different lengths of time, in accordance with your company's records schedule.

If you have too many records, you may have to turn over information that could be damaging to your case in any litigation against you - information you aren't even required to keep in the first place. Confidential information may be leaked, stolen, or lost, and the probability of that happening only goes up with time. Additionally, if you have a ton of records that you don't need and won't use, your ability to find the information you do need is severely hampered.

While high storage costs may be a factor for disposing of unneeded data, it is not the reason for doing so. You shouldn't be keeping more data just because storage is getting cheaper.

Purging is bad. (1, Funny)

PsyberS (1356021) | more than 6 years ago | (#25054525)

Data bulimia is a serious problem. If you know someone effected, make sure to get them the help they need asap.

Re:Purging is bad. (1)

hcpxvi (773888) | more than 6 years ago | (#25054675)

A well-known IT professional has been advocating this policy for some years now. http://bofh.ntk.net/Bastard.html [ntk.net]

Huh? (4, Insightful)

qoncept (599709) | more than 6 years ago | (#25054575)

$250k a year for a 5000 employee company? To put it in perspective, if the average employee at this company is making $60k a year, this company will be paying $1.5 billion in salaries over the same 5 years. To be fair, I think the estimated cost from the article is very much underestimated. But while corporate storage costs more than you'd think, and companies are definately storing a whole bunch of data they don't need, what about the costs of reviewing and purging that data? That is straight up time, whether it's reviewing existing data or spending the time to create guidelines for which data to keep. And time costs money. More than storage.

Re:Huh? (1)

fast turtle (1118037) | more than 6 years ago | (#25055005)

and those costs are even higher when done by a law team during a discovery process. Gets quite expensive when law teams are billing $1k per hour to do discovery.

I myself have found it far cheaper as a small business owner to have a written document retention policy along with a written policy that all business docs have a VCS Date and Number. In fact after I discussed the matter with another local small biz owner, I'm damn glad I've got such a policy in place as they're already going through the distraction and resulting loss of business while attempting to archive to CD/DVD several years worth of email archives (Outlook/Outlook Express/AOL).

As far as having a written policy goes, that's not enough. You also have to follow it, otherwise the court will hang your ass for destruction of records along with contempt, possibly costing you the case. So once the policies are in place follow them.

Re:Huh? (1)

John Hasler (414242) | more than 6 years ago | (#25055989)

> and those costs are even higher when done by a law team during a discovery
> process. Gets quite expensive when law teams are billing $1k per hour to do
> discovery.

This is a very good point. The more data you have and the more poorly organized it is the more it costs you to honor discovery requests whether or not anything relevant is found. Thus there exists incentive to index your archive and minimize its size even if you are confident that it contains nothing that could be used against you.

Re:Huh? (1)

TubeSteak (669689) | more than 6 years ago | (#25055177)

what about the costs of reviewing and purging that data? That is straight up time, whether it's reviewing existing data or spending the time to create guidelines for which data to keep.

Right now, the-way-things-are-done is to save it all and pay for it.
You can train employees to change the-way-things-are-done.

The learning curve is expensive, but the general idea (aspirational, as with anything corporate) is that once everyone figures out the policies, time is used more efficiently and the 'cost' goes down.

And time costs money. More than storage.

Can I see the report that verifies your assertions?
You did have someone study the long term costs and give you hard numbers, didn't you?
A company isn't going to fsck around their multi-million IT budget without someone in-house or consultant studying the matter.

Re:Huh? (1)

Archangel Michael (180766) | more than 6 years ago | (#25057095)

To put this into perspective, we have PRA requests for all sorts of "data" that we are supposed to keep. It has become almost a full time job going through all the crap to find what the PRA requests are asking for.

And we're a SMALL school district.

10GB of data!! (0, Offtopic)

PinkyDead (862370) | more than 6 years ago | (#25054579)

Maybe it'd be cheaper for the companies to buy the employees an annual subscription to Penthouse.

Jeez lads, would you lay off the porn?!

big 'purge' occurring as we fail to address issues (-1, Troll)

Anonymous Coward | more than 6 years ago | (#25054615)

it's all part of the creators' wwwildly popular, newclear powered planet/population rescue initiative/mandate.

greed, fear & ego are unprecedented evile's primary weapons. those, along with deception & coercion, helps most of us remain (unwittingly?) dependent on its' life0cidal hired goons' agenda. most of yOUR dwindling resources are being squandered on the 'wars', & continuation of the billionerrors stock markup FraUD/pyramid schemes. nobody ever mentions the real long term costs of those debacles in both life & any notion of prosperity for us, or our children, not to mention the abuse of the consciences of those of us who still have one. see you on the other side of it. the lights are coming up all over now. conspiracy theorists are being vindicated. some might choose a tin umbrella to go with their hats. the fairytail is winding down now. let your conscience be yOUR guide. you can be more helpful than you might have imagined. there are still some choices. if they do not suit you, consider the likely results of continuing to follow the corepirate nazi hypenosys story LIEn, whereas anything of relevance is replaced almost instantly with pr ?firm? scriptdead mindphuking propaganda or 'celebrity' trivia 'foam'. meanwhile; don't forget to get a little more oxygen on yOUR brain, & look up in the sky from time to time, starting early in the day. there's lots going on up there.

http://news.google.com/?ncl=1216734813&hl=en&topic=n
http://www.nytimes.com/2007/12/31/opinion/31mon1.html?em&ex=1199336400&en=c4b5414371631707&ei=5087%0A
http://www.nytimes.com/2008/05/29/world/29amnesty.html?hp
http://www.cnn.com/2008/US/06/02/nasa.global.warming.ap/index.html
http://www.cnn.com/2008/US/weather/06/05/severe.weather.ap/index.html
http://www.cnn.com/2008/US/weather/06/02/honore.preparedness/index.html
http://www.nytimes.com/2008/06/01/opinion/01dowd.html?em&ex=1212638400&en=744b7cebc86723e5&ei=5087%0A
http://www.cnn.com/2008/POLITICS/06/05/senate.iraq/index.html
http://www.nytimes.com/2008/06/17/washington/17contractor.html?hp
http://www.nytimes.com/2008/07/03/world/middleeast/03kurdistan.html?_r=1&hp&oref=slogin
http://biz.yahoo.com/ap/080708/cheney_climate.html
http://news.yahoo.com/s/politico/20080805/pl_politico/12308;_ylt=A0wNcxTPdJhILAYAVQms0NUE
http://news.yahoo.com/s/nm/20080903/ts_nm/environment_arctic_dc;_ylt=A0wNcwhhcb5It3EBoy2s0NUE

is it time to get real yet? A LOT of energy is being squandered in attempts to keep US in the dark. in the end (give or take a few 1000 years), the creators will prevail (world without end, etc...), as it has always been. the process of gaining yOUR release from the current hostage situation may not be what you might think it is. butt of course, most of US don't know, or care what a precarious/fatal situation we're in. for example; the insidious attempts by the felonious corepirate nazi execrable to block the suns' light, interfering with a requirement (sunlight) for us to stay healthy/alive. it's likely not good for yOUR health/memories 'else they'd be bragging about it? we're intending for the whoreabully deceptive (they'll do ANYTHING for a bit more monIE/power) felons to give up/fail even further, in attempting to control the 'weather', as well as a # of other things/events.

http://www.google.com/search?hl=en&q=weather+manipulation&btnG=Search
http://video.google.com/videosearch?hl=en&q=video+cloud+spraying

'The current rate of extinction is around 10 to 100 times the usual background level, and has been elevated above the background level since the Pleistocene. The current extinction rate is more rapid than in any other extinction event in earth history, and 50% of species could be extinct by the end of this century. While the role of humans is unclear in the longer-term extinction pattern, it is clear that factors such as deforestation, habitat destruction, hunting, the introduction of non-native species, pollution and climate change have reduced biodiversity profoundly.' (wiki)

consult with/trust in yOUR creators. providing more than enough of everything for everyone (without any distracting/spiritdead personal gain motives), whilst badtolling unprecedented evile, using an unlimited supply of newclear power, since/until forever. see you there?

"If my people, which are called by my name, shall humble themselves, and pray, and seek my face, and turn from their wicked ways; then will I hear from heaven, and will forgive their sin, and will heal their land."

More data is good (0)

Anonymous Coward | more than 6 years ago | (#25054631)

I work for a storage company, stop messing with my job security.

10 GB user data? Not likely (5, Insightful)

arth1 (260657) | more than 6 years ago | (#25054659)

10 GB of data per user, sure.
10 GB of user data, no way.
If assuming 300 work days per employee, that would mean that the average employee creates 1.2 kB of data per second.

The only way this could be true is if you count data that isn't user generated, and they count the total data storage for the company and divide it by employees.
If so, users deleting their e-mails won't have much of an effect.

Re:10 GB user data? Not likely (1)

cashman73 (855518) | more than 6 years ago | (#25055079)

Only 10 GB?!?! Pfft! Amateurs,...

I've been in my current position almost a year now, and I've already generated about 1/2 a terabyte of data; and that's only the stuff I've decided is worth keeping (I've probably generated several terabytes in reality),... Of course, I'm probably not your average office worker -- my data is mostly monte carlo simulations of proteins, on the order of millions (some in the billions) of steps long. Some of the largest trajectories are 45 GB (yes, that's one file).

Re:10 GB user data? Not likely (0)

Anonymous Coward | more than 6 years ago | (#25055165)

Only 10 GB?!?! Pfft! Amateurs,...

I've been in my current position almost a year now, and I've already generated about 1/2 a terabyte of data; and that's only the stuff I've decided is worth keeping (I've probably generated several terabytes in reality),... Of course, I'm probably not your average office worker -- my data is mostly monte carlo simulations of proteins, on the order of millions (some in the billions) of steps long. Some of the largest trajectories are 45 GB (yes, that's one file).

At least that's what you tell your boss so he won't find your porn.

Re:10 GB user data? Not likely (1)

BrokenHalo (565198) | more than 6 years ago | (#25057109)

At least that's what you tell your boss so he won't find your porn.

Bullshit. Proteins are MUCH more interesting than porn. ;-)

Actually, I am only half joking - I waste far too much time here on Slashdot, and from time to time I have to give myself a nudge to get on with my job, only to find that the work is more interesting...

Re:10 GB user data? Not likely (0)

Anonymous Coward | more than 6 years ago | (#25056431)

Big whoop.

Re:10 GB user data? Not likely (1)

euri.ca (984408) | more than 6 years ago | (#25056815)

Nope that sounds pretty typical :)

After all, most coders come in everyday and re-copy the source tree, libraries and all to a new folder, in case they make a mistake and need to go back to a previous version.

No? Really? They told me that this was industry standard practice.

Re:10 GB user data? Not likely (1)

BrokenHalo (565198) | more than 6 years ago | (#25057227)

They told me that this was industry standard practice.

No. The industry standard practice is to store the source code for every program you have ever written on punch-cards in a locked filing cabinet.

Or didn't you know that? ;-D

(Just to spell it out for the irony-impaired: if this slips under the radar of your world view, google "Real Programmers Don't Eat Quiche".)

Re:10 GB user data? Not likely (1)

Profane MuthaFucka (574406) | more than 6 years ago | (#25057047)

That's nothing. I work for a computer consultantcy and I have half a terabyte of attachments and meeting invites alone.

Re:10 GB user data? Not likely (1)

confused one (671304) | more than 6 years ago | (#25055159)

You're obviously not writing software, doing CAD work, or any kind of computational modeling. It's easy to have that much data -- my source tree alone is 2GB.

Re:10 GB user data? Not likely (2, Funny)

value_added (719364) | more than 6 years ago | (#25055453)

If assuming 300 work days per employee, that would mean that the average employee creates 1.2 kB of data per second.

Top posting and absence of editing by Microsoft Outlook users engaged in a brief inter-departmental discussion could easily account for that volume.

Is that what you meant by "isn't user generated"?

Re:10 GB user data? Not likely (1)

Lord Ender (156273) | more than 6 years ago | (#25055477)

They count more than just the stuff you typed as "user data." For example, Linux admins download ISOs, lawyers download PDFs, Windows admins download patches, service packs, and malware cleaning tools, and sales people download porn. All this data is used by the users and must be archived.

It's not the storage... it's the apps (4, Insightful)

paulhar (652995) | more than 6 years ago | (#25054703)

Apps aren't really well designed for this in mind. They don't come at the problem from a "document lifecycle" perspective but instead a "document creation".

This is generally because data has a variable lifespan. Lets take an email as part of a project as an example. As the author I may decide that the email isn't needed after a week so set an expiry of 1 week. But you, as the recipient, may take that email and turn that into several tasks so for you the email is much more important and thus want to keep it for much longer.

Users aren't really going to be good at making these decisions unless some application continually bombards them with "go check the status of these 1000 documents you've got".

Re:It's not the storage... it's the apps (3, Informative)

ubercam (1025540) | more than 6 years ago | (#25056231)

Users aren't meant to be making those decisions, the Records Management department should be... that is if you even have one! If you leave everything up to the users, you WILL have a cluster fuck of records.

I work in Records Management at a large company with many different divisions in diverse fields. RM is completely left up to us. We manage well over 10,000 boxes and there's only 3 of us. We alone determine when something is to be destroyed (but require authorization from dept heads to be shredded), how long it's kept, etc.

Disclaimer: We work mainly with paper records, but the exact same principles apply to electronic records.

You need a retention schedule. Look at your national, state/provincial and municipal laws to determine the minimum legally required length of time each TYPE of record is to be kept. Employee time cards are different from pension plans, sales invoices and legal files. It's not *always* 7 years either. Some are less, some are more, some are permanent. Also, you don't have to shred when the law says it's time if there's a valid business reason to keep that set of records. I mean, let's get this straight. You don't HAVE TO shred at all, but you're digging yourself a deep hole if you do... "You can get in just as much trouble by keeping records too long as you can by destroying them too quickly." - Dr. Mark Langemo

If this was all left up to individuals, they would just keep everything. I've seen what this is like, and it's pathetic, maddening and counter productive. Things must be properly named and catalogued down to the file level when put in storage, or you will NEVER find ANYTHING without an exhaustive search EVERY time. It might be alright when it's on your desk or in your local filing area and you know what's where, but when you archive it, you can't assume the guy looking for your file you need knows anything about it. We need explicit details or else we can't help you. At my company we require everyone to fill out a nice sheet detailing the contents of their box, the type of records, dates (most remember dates above all else), sender's name, dept, etc.

We are by no means a perfect operation here, but we're far better than 90% of other companies out there.

There is a series of excellent seminars done by Dr. Mark Langemo (sorry no links) to teach you how to deal with records. Also check out ARMA International [arma.org] if you're looking to get in touch with other Records Managers in your area. They have local chapters all over the place.

To summarize, if your company doesn't have a Records Manager, HIRE ONE NOW and give him/her the resources to get your records under control! Check out ARMA, they have jobs posted on their site. There are also many companies out there that will help you clean up your stuff and get you started on the right track.

Re:It's not the storage... it's the apps (1)

paulhar (652995) | more than 6 years ago | (#25056721)

While I agree that for records / data that are structured it may be possible to implement a better regime, a lot of the data that flows around companies isn't structured and is stored either in email that goes back and forth, or in group shared folders in Word\Excel style documents.

I pity the poor Records Manager who would have to go through everyones email to subjectively decide if an email can be deleted without the context that surrounds them.

Mod parent way up! (3, Interesting)

khasim (1285) | more than 6 years ago | (#25056463)

Congratulations. You're the first person I've seen who understands that.

Accounting understands the need to close one year and open the next. They have processes for what is carried over and how it is identified.

Yet no other department (or application) understands the need to close old data and archive it.

It depends upon business (2, Informative)

William Robinson (875390) | more than 6 years ago | (#25054769)

For example, Financial institutions are required to keep data for longer period for legal purpose as well as traceability (during investigation of fraud or other kind of crimes). The banks worked for had legal requirement of keeping data at 2 places at least 15 km apart, with all kind of protection against fire and intrusion.

A good manufacturing company would keep data for longer period ot only to comply with ISO standards, but to trace manufacturing defects and a good evidence of past history for insurance company against theft/fire and other kind of problems.

We used to keep daily changes of source code of only previous releases, and purge rest of of the releases (we kept the final source code and patches of all previous releases, but purge daily changes).

In a nutshell, it depends upon your type of bussines.

Re:It depends upon business (3, Insightful)

PainKilleR-CE (597083) | more than 6 years ago | (#25054989)

Additionally, there are many businesses that don't understand their data retention requirements beyond 'we need to keep some data for 10 years', so instead of compartmentalizing their data and saying 'keep this for 10 years, that for 5 years, and purge this every year and that every 3 months', they just keep everything. Further, if they have a data retention requirement for 3 years or 10 years, they might wait longer before purging it just because it's easier to keep it then it is to go find and remove the 5 or 12 year old data.

I only recently organized some data being maintained by the company I work for that was basically divided into 'archived' and 'live' data, logs generated by a many-user application. The 'archived' data went back 4 or 5 years with no easy distinction between data that was many years old and data that was generated in the most recent archive. Now at least the data is sorted by date (and being archived by date), so that when someone decides on how long we want to keep it (they can't seem to make up their mind, and while everyone seems to agree that we don't need data from 2005 and earlier, no one's willing to say I can delete it, either), it won't be hard to dump the older data at least on an annual or semi-annual basis.

Too Much Cost? (1)

Kuriomister (1366535) | more than 6 years ago | (#25054791)

so $50 a year is now too much for a large company to tag onto employee costs? If someone is making $30,000 a year, whats another $50. The problem might be in multi-year retention, in which a 2 year employee will require $100 of storage and so on. but this does not account for the diminishing price in memory costs or other, associated costs. Maintaining a 10 year archive at that price, and assuming that employees where putting out 10gb of data 10 years ago, would cost $500 a employee, and scale that up to a larger company, and you can see data storage prices in the millions. This is assuming that the data is: 1)Stored serverside 2)Not kept only as a physical backup after lets say 3 years. It would be cheaper in the long run to after some point x to move everyting to hard storage and keep it offline, only to be used in the case of lawsuits and other, archival needs. Using a model like this allows for near unlimited storage time with minimal costs. If a new format of storage comes about, the biggest pain might be updating these records, but in the terms of memory costs for such a operation, look at the advancements in storage space; 10 years ago, people thought 10gb was large.

Yeah this whole thing seems a little fishy... (1)

Smeagel (682550) | more than 6 years ago | (#25055065)

On top of what you said - $5 a gigabyte? What is this 1998? Even if you get WD's highest quality consumer hard drives they're about $1 a gigabyte, plus if you buy them in bulk they're probably considerably cheaper. You can use 2 or 3 of them for data redundancy, and it's still significantly cheaper. I question where they got that number.

Re:Yeah this whole thing seems a little fishy... (1)

jimicus (737525) | more than 6 years ago | (#25055373)

On top of what you said - $5 a gigabyte? What is this 1998? Even if you get WD's highest quality consumer hard drives they're about $1 a gigabyte, plus if you buy them in bulk they're probably considerably cheaper. You can use 2 or 3 of them for data redundancy, and it's still significantly cheaper. I question where they got that number.

As soon as you say that I can be reasonably sure that you've never factored in storage costs for anything fancier than a desktop PC.

SAS disks are typically 3-5 times more expensive per drive. Factor in RAID (level 5 if you want capacity, 10 if you want performance, 6 if you want a compromise of both) and can potentially double the cost per gigabyte. But you can't get 15,000 RPM SATA disks and you can't bond SATA channels together for performance.

Secondly, seeing as the subject is archiving they're probably talking about tape rather than hard disks. Tapes have the big advantage that you can handle them a lot more roughly, transport them more easily than disks and they can be archived for longer because they don't suffer from sticktion.

Thirdly, I don't think the cost of media is the biggest factor by a long way. They've probably also factored in cost of a contract with Iron Mountain, cost of robotic tape library, licensing costs for TSM (or similar) and a proportion of the wages involved in paying someone to swap the tapes out and hand them over to Iron Mountain every day.

By your interesting math... (1)

Smeagel (682550) | more than 6 years ago | (#25055545)

It should be $10-20 a gigabyte....

Drive is 3-5x more expensive than $1 a gigabyte...raid level 5 means 2+ drives, we're to $6-10 already, then you say the majority of the cost wouldn't be in the media...

From working at a large university, three fortune 500 companies, and now the small business I work for, I don't think it's even suggestible that most user data is backed up in an out-sourced tape data center. That's an absurd suggestion. The vast majority of data never makes it off either a local hard drive or a temporary, lightly backed up network "drive".

No matter how you skew it, the numbers they came up with in the original post are absurd. Be it either how much the data costs to store, or how much data is being stored - one of those is way out of wack with reality.

Re:Yeah this whole thing seems a little fishy... (1)

RyansPrivates (634385) | more than 6 years ago | (#25055451)

We are not talking about consumer hard drives here, we're talking about enterprise storage solutions. You can't just throw 100 terabyte consumer SATA disks in a closet and expect to have a "storage solution".

An enterprise solution comprises the MUCH pricier SAS and FC disks inside of a SAN. Just at this first level, you've already spent more than your $1/GB.

Then, throw in the associated SAS and SATA disks for backup, as well as tape for archiving, and all of the infrastructure to support it and labor required to make it work.

There is nothing fishy about it. Enterprise storage solutions are PRICEY beasts...

Re:Yeah this whole thing seems a little fishy... (1)

Smeagel (682550) | more than 6 years ago | (#25055615)

Yes there is something fishy.

One of us is either making a wrong assumption:

1) I was assuming they couldn't have been talking about long term storage, because no way an *average* user produces 10GB a year that needs long term storage.

2) You were assuming that somehow the average user produces 10GB a year that requires long-term storage.

There is no way that the average user generates 10GB of data that makes it into long-term storage in a year.

That's approximately 200MB of data a week. Most corporate users generate a few megs in email and a few megs in spreadsheets in a week.

Re:Yeah this whole thing seems a little fishy... (2, Insightful)

RyansPrivates (634385) | more than 6 years ago | (#25055979)

I definitely see where you're coming from, and you SHOULD be right. However, this goes to the heart of the article: most companies are OVER-retaining their data. Backing up things that shouldn't be backed up, and retaining things beyond legal requirements or indefinitely.

Additionally, even though we may not agree on the figures, we definitely agree that storage costs have exponentially decreased. This has led to the trend to just keep adding storage, as opposed to actually going through what is being stored and for how long.

Like I stated in another post, this problem needs to be attacked from a business policy angle, not merely from a technological capacity (pun fully intended).

Choosing your battles (1)

ka9dgx (72702) | more than 6 years ago | (#25054899)

It just doesn't make sense to expend the limited political capital of the IT department to nag people into cleaning up their folders. If you're in a small company, and can more than double your server storage for $1000, instead of pissing off 25 people, you'll spend the money, and so will the CFO. I should know, we've done it more than once over the past 10 years.

It's far better to spend a few $K than to waste literally weeks of time trying to sort things out, especially when you need sales to be selling and not worried about their computers.

--Mike--

Re:Choosing your battles (1)

cowscows (103644) | more than 6 years ago | (#25055527)

Exactly. I've worked at my current company for about three years. It'd take me a few days at least to go through all the documents that I've created since I've been here. The cost of storing all those documents is significantly less than the billable hours that my company would have to give up for me to spend those days sorting paper. Not to mention the fact that I can't imagine have the luxury of a few days without having to worry about projects/clients/etc and have the time to focus on sorting through stacks of documents and emails.

Coincidentally, I generally feel the same way about all my "life" paperwork that I get at home (bills and receipts and such). My wife is a bit obsessive about filing things and having it all very organized, while I'm perfectly happy to just throw everything in a box and forget about it. Sure, when I have to find something, I'll spend five minutes digging compared to my wife who could just walk over the file cabinet and find it in about 30 seconds. But I so rarely need to actually go retrieve something that a few five minute searching sessions per year adds up to significantly less time than I'd require to consistently file everything.

Email Attachments (4, Insightful)

whisper_jeff (680366) | more than 6 years ago | (#25054903)

I don't know what most major companies' policies are regarding backing up emails (just back up the text or back up emails plus attachments) but, as but one example, I'm sure this would be an easy spot for most companies to dramatically reduce the amount of storage space required. Most business communications I see from corporate personnel have various attachments on every email - things like logos, custom backgrounds, etc. Forget getting rid of all the unnecessary attachments - getting rid of the "look at my pretty email that looks like a page from a spiral-bound notebook with my company logo at the bottom" images, and the hundreds and thousands of duplicates of those images, would reduce storage requirements, bandwidth requirements, and probably make corporate communications look more, you know, professional. So many emails are filled with unnecessary garbage and, if that's being backed up, that garbage can get costly.

Then again, I'm biased - I believe email should just be pure text. Perhaps that's a sign that I'm now old...

Re:Email Attachments (1)

xgr3gx (1068984) | more than 6 years ago | (#25055095)

Hmm, maybe I should stop doing my weekly email of the "Monkey Drinking his own pee" video to 200 hundred people in my department.
I guess that might explain all the SAN storage requests for our email archive servers.

Re:Email Attachments (1)

daniel_newby (1335811) | more than 6 years ago | (#25056617)

"... getting rid of the "look at my pretty email that looks like a page from a spiral-bound notebook with my company logo at the bottom" images, and the hundreds and thousands of duplicates of those images, would reduce storage requirements, bandwidth requirements, ...

So parse the MIME headers, separate the files, and store them in a content-addressable filesystem. A content-addressable filesystem hashes each file, then indexes the file under its hash instead of its name. Duplicates are automatically consolidated into a single file.

Folks who are especially aggressive could even diff each email against all recent emails and extract common fragments. It wouldn't even be especially slow if implemented right.

Re:Email Attachments (1)

euri.ca (984408) | more than 6 years ago | (#25056937)

I've been getting stupid emails like that for years and years... you'd think that Outlook would've dealt with that by storing includes by size & hash.

But no. Actually, I think they still store them inline in the .pst file. (fun Microsoft fact/theory: it's not the devs/PMs' fault, they could fix it easily if enough people weren't buying Outlook because of the way it stores files.)

Poor Decision Making and Follow Through... (1)

RyansPrivates (634385) | more than 6 years ago | (#25054951)

"...most organizations hang on to more data than they need, for much longer than they should."

As an infrastructure consultant, I see this EVERYWHERE. At the average client, I find the same INSTALL MEDIA (O/S ISOs) in three or more locations, all of which are being backed up. WHY??? You already have a TRUE backup, it's called the CD the software came on, or the electronic source you downloaded it from. Just gigs upon gigs of wasted space.

And don't even get me started on email limits and policies. I can't think of a single company that actually enforces the mail limits. And even those that do, will "extend" or except nearly anyone who requests it.

We in the IT field need to create better policies and actually follow through on them...

Re:Poor Decision Making and Follow Through... (1)

Atrox666 (957601) | more than 6 years ago | (#25055927)

I'm guilty as hell here.
The fact is that I do backup certain installs on the network.
My reasons for doing this are:
1) I work with idiots and they lose stuff all the time. The original disks for half the software in the company are probably under someone's coffee mug somewhere.
2) I work with thieves. People "borrow" my disks all the time. I've fixed this by writing "virus infected", "Damaged", "old version do not use" or "Corrupt" on most of my important stuff.
3) The archiving / version control system we bought is too expensive so we aren't allowed to put most things we need to preserve
4) Our archivist doesn't really do anything or know how and until her ass starts to sag she won't get fired.
5) Using drive shares and active directory for file storage is inefficient and results in a lot of duplication of files due to the lack of maintainable granular security control. Even Sharepoint would be a better idea.
6) If I save them money they never share any of it with me so what the hell do I care if they make any profit? I actually get a big smile on my face every time I see massive waste because these are not good people and bad people deserve to have bad things happen to them. If I were to try and fix it I would get all the blame if it went bad and none of the credit if it went well.

Data Discovery Woes (1, Informative)

Anonymous Coward | more than 6 years ago | (#25054955)

I work for a few lawyers and we just began running into issues with "data discovery". Two recent examples:

1.
They are a medium sized law firm and they were involved in a lawsuit with another law firm. The other law firm (much smaller) required a copy of all the data from the firm.

Data from encrypted laptops = 80GB x 6 users
2 hours per laptop to decrypt and image (12 hours)
Data from 4 servers and email = 65GB (2 hours)
That's now almost 500GB and 14 billable hours of support.

2.

The law firm was involved in a lawsuit where they were doing discovery and had to review evidence.
They were going to get data from 10 laptops (800GB total) that will require backups of the data and archival for X years (so far it is 1 year and indefinite).

Quickly the data discovery is getting expensive - and annoying on a technical level.

I'm 500% better than average! (1)

Simonetta (207550) | more than 6 years ago | (#25055001)

average employee generates 10GB of data per year at a cost of $5 per gigabyte to back it up...

I cry nonsense in the statement above.

I put a 25 cent blank DVD into the DVDwriter of my PC. Then I copy the entire contents of my 'C:\backup' folder onto this DVD. I start the program, and go do something else. Total dedicated time: 2 minutes

When the DVD write is done, I write a label code on the DVD (date, employee, backup number) and put the disk back on the stack in the file cabinet. Total dedicated time: 2 minutes

My salary and benefits: @ $18/hr time used on backup: 0.067 hrs My cost per gigabyte of backup: $1

So if I'm an average marginally competent employee, why can I do backup %500 more efficiently than the average.
This statistic must be junk.

Re:I'm 500% better than average! (1)

confused one (671304) | more than 6 years ago | (#25055189)

Large corporations back up servers on tape. Good tapes and tape drives are expensive. Including support, maintenance and replacement costs, $5 per gigabyte probably isn't that bad.

Re:I'm 500% better than average! (2, Insightful)

Chris Mattern (191822) | more than 6 years ago | (#25055371)

Unfortunately, writable DVDs are not an acceptable archive medium, and a stack of disks with written labels is not an indexing solution that will scale beyond one person.

Re:I'm 500% better than average! (1)

Vellmont (569020) | more than 6 years ago | (#25055463)


My salary and benefits: @ $18/hr time used on backup: 0.067 hrs My cost per gigabyte of backup: $1

And you backed it up a total of once. The cost of $5 is likely a yearly cost (as the volume is yearly), Backups are usually done 1/day. Your yearly costs would be in the hundreds of dollars per gigabyte.

Re:I'm 500% better than average! (1)

Geoffrey.landis (926948) | more than 6 years ago | (#25056097)

My salary and benefits: @ $18/hr time used on backup: 0.067 hrs My cost per gigabyte of backup: $1

You haven't counted overhead. First, there is your personal overhead. Do you talk to your co-workers in the hall? Get coffee on company time? Go to the bathroom? Fill out time sheets to account for what you do all day? Read memos telling you that you have to fill out time sheets? Read your e-mail? Post comments to slashdot at 10:47AM on a workday? Only robots are 100% efficient in their use of time.

And then there is company overhead-- your computer, pens, paper, copy machine, office, lighting, secretaries, payroll, all that stuff. Even if you don't notice it, even if you don't use it, it's there.

$18/hour?? You'd be lucky if you cost the company less than $100/hour.

Re:I'm 500% better than average! (0)

Anonymous Coward | more than 6 years ago | (#25055493)

I put a 25 cent blank DVD into the DVDwriter of my PC. Then I copy the entire contents of my 'C:\backup' folder onto this DVD. I start the program, and go do something else. Total dedicated time: 2 minutes

When the DVD write is done, I write a label code on the DVD (date, employee, backup number) and put the disk back on the stack in the file cabinet. Total dedicated time: 2 minutes

My salary and benefits: @ $18/hr time used on backup: 0.067 hrs My cost per gigabyte of backup: $1

So if I'm an average marginally competent employee, why can I do backup %500 more efficiently than the average.
This statistic must be junk.

Because your "backup" will go poof in a number of cases that a real backup or archive will survive easily. Just to name the top three:

  • If the DVD with your backup develops more defects than ECC can cope with, your data is gone.
  • If the building hwere teh file cabinet is located burns down, your data is gone
  • If you need your data in 10 years time and discover that DVD drives are as obsolete as 8" floppy disks are nowadays, your data will still be there but you won't be able to read it, which is only sligthly better than gone.

If you want to avoid only these three common pitfalls for archiving, you'd have to:

  • Write every backup to at least two DVDs
  • Drive to a different location to store the second one every time you mae a backup
  • Check all accumulated backups for media defects on a regular basis, and copy those where the defect rate increases to new media.
  • Copy all accumulated backups to different media in time when the current backup media starts going out of fashion.

which will take considerably more time than four minutes a day...

It was easier with paper... (1)

swm (171547) | more than 6 years ago | (#25055009)

Used to be records were kept on paper,
paper was kept in boxes,
and boxes were dated MM/YY.

I came into the office one fine 1998 January 02,
and the hallway was stacked full of boxes dated 01/94,
02/94, 03/94, etc.

Company policy was discard records after three years,
so all records from 1994 were on their way to the dumpster.

Re:It was easier with paper... (1)

cashman73 (855518) | more than 6 years ago | (#25055169)

Used to be records were kept on paper, paper was kept in boxes, and boxes were dated MM/YY.

So THAT explains why they kept moving Milton's desk (image [dereksemmler.com] )! I guess all those TPS reports take up space!

keep, but not on the high-performance disk arrays (1)

petes_PoV (912422) | more than 6 years ago | (#25055035)

The major cost of purging is the manpower and downtime. Therefore it's easier to keep the stuff, possibly with occasional housekeeping if your schema isn't as scalable as it should be. While the legal and tax requirements (which vary from country to country) have a limited lifetime, there are always possibilities, such as legal defences, where old data may be needed. These uses will not require the performance (and cost) of enterprise class storage: speed, redundancy, administration, warranties.So migrate it to a few 1TB drives in someone's desk. That way if subpoena'd you can plausibly have "lost" it, whereas if it's in your interests, it can miraculously be found.

Re:keep, but not on the high-performance disk arra (1)

NoisySplatter (847631) | more than 6 years ago | (#25056439)

Holy Enron Batman! I hope you aren't suggesting perjury is better than accountability.

Communicate less (2, Interesting)

Yvanhoe (564877) | more than 6 years ago | (#25055083)

In a world where backup takes money, a law that says to companies "keep every communication backuped" is saying essentially the same thing as "communicate less".

Re:Communicate less (1)

BobMcD (601576) | more than 6 years ago | (#25055813)

Or communicate less in writing - I personally have had this policy for a long time. If I worry that a question, comment, concern, etc might not reflect well on me in the future, I walk into my boss's office and ask out loud (with the door shut.) If I want the communication to be recorded for all eternity I use email...

Re:Communicate less (1)

NoisySplatter (847631) | more than 6 years ago | (#25056353)

Little do you know he has a microphone recording everything that goes on in his office to .wav format. Your little conversations are costing millions to back up.

easy solution (2, Funny)

circletimessquare (444983) | more than 6 years ago | (#25055143)

put everything on one disk drive, unRAIDed. when it fails, problem solved. voila, built in obsolescence

Slaw? (0)

Anonymous Coward | more than 6 years ago | (#25055255)

I can't see how wanting "more slaw" is on topic, or why it would be spelled with two Os. Oooh .. Moore's Law [wikipedia.org] . Then there's the sound-alike Mooer's Law [wikipedia.org] which was summarized as "focus[ing] on [the] idea that people may not want information, as it obliges them to study the information, and come to an understanding about it."

Storage vs Study, Moore vs Mooer - fight!

Future BI. (1)

jellomizer (103300) | more than 6 years ago | (#25055471)

Business Intelegence Software just may make use of the software. Wile a lot of buisness are STUPID in their use of BI Software. There may be some point either the company dies or will get a clue and do some BI analysis on its data.
You actually can do some amaizing things with BI. Say for example You are storing Time Card Data from employees. And you want to check the effectivnes of managers. So with say 20 years of time card data and employee records of which manager is which. You just may find a coraltion between differn't managers how long people take their breaks, how many sick days they take. Factor out difference of age and experience in the company, then possible create a coralation of how much value the department makes over the next.... And you will have in nice number form proof that Manager A sucks, while Manager B is effective. Even if people may not like Manager B as much, or the people under him like him, but his managers don't... (as they may have been found to be bad managers by the same calculations).

Oddly enough computers are really good at doing a lot of complex math... Imagine that... So it can far easier handle crunching 20 years of data and finding coralations far better then many peoples gut feeling.

litigation hold (2, Informative)

Benjamin_Wright (1168679) | more than 6 years ago | (#25056129)

Any record destruction policy must include a "litigation hold". A litigation hold means that record destruction must stop when litigation is anticipated or pending. But in a complex enterprise, it is tricky to know what litigation the enterprise anticipates. It was the trickiness of litigation hold that led to the demise of Arthur Andersen. The risks associated with litigation hold give enterprises incentive to store lots more records. --Ben http://hack-igations.blogspot.com/2008/07/document-discovery-litigation-hold.html [blogspot.com]

Who decides what to delete? (1)

HockeyPuck (141947) | more than 6 years ago | (#25056273)

Look at how people deal with email. I've got coworkers that have every single email (including mailing lists they've subscribed to) they've ever sent or received since they started (~8yrs ago). They're probably got 20GB of email on their laptop. Now we only allow 100MB of server based email storage, so that helps on the server side, but we're still backing up this guys laptop.

On the datacenter side, we had a database corruption about 10years ago so we implemented snapshots, and then snapshots of those snapshots... we actually now carry about seven copies of our database. Why still seven? Because nobody wants to have to recommend that we have fewer copies of data in case we have another problem again. The funny part is that nobody in operations was around at the time of this outage.

Atleast de-duplication technology is being adopted, which gives us an excuse to hoard even more data. However from a legal standpoint, tell the Judge we don't retain data older than X years is easier than recalling 50k tapes you sent offsite 8years ago.

Bottom Line: It's just easier to store it than to be the one that "Recommended we delete XYZ files, that's why we don't have the data."

What about History? (1)

jumbomojo (1290828) | more than 6 years ago | (#25056631)

Altho I agree that the inertia of keeping records trumps the work of evaluating them, the large financial services company I work for is turning with the tide, starting to focus on deletion and destruction, mainly for potential liability reasons. Not just aged documents, but prior versions, drafts, notes, etc. It makes me wonder what the historians of the future will have left for primary sources--besides the final, signed-off Establishment-sanctioned records of events. Are we on the road to compromising their ability to determine and describe What Really Happened, and thus our own ability to understand our past? Could John M. Blair write "The Control of Oil", or Ron Chernow "Titan: the Life of John D. Rockefeller Sr." fifty years hence?

skipped the article, loved the tag (0)

Anonymous Coward | more than 6 years ago | (#25056709)

mmmmmmmmmmmmmmmmmmmmmmmmmm, mooreslaw.

mod 3o3n (-1, Flamebait)

Anonymous Coward | more than 6 years ago | (#25056861)

This is what Retention Policies are for (1)

Phrogman (80473) | more than 6 years ago | (#25056987)

IANAL. This is why most companies spend some money developing a retention policy and planning its implementation. It requires a bit of time from every employee to decide if a piece of information is something that requires short term, long term or permanent storage but if you get people into the habit of sorting things like email into folders that reflect the company retention policies (which need to be pretty clear and well planned both from an IT and a legal perspective) then you can reduce the cruft you retain considerably.

With clear policies on when the various categories of information can be safely and legally deleted you can reduce the storage costs and simplify the e-discovery phase if it comes up.

Likewise you need good planning and employee training on what to do when a Hold is placed. Ie, if your company enters litigation, you will place a hold on data deletion and *NOTHING* gets deleted so that the courts can't find you guilty of attempting to hide information from them in a litigation.

Any company that doesn't come up with a retention policy that takes everything into consideration, doesn't train its employees on those policies and doesn't practice what it has decided will be its policy is in for a world of hurt when suddenly its in court and has to produce emails from a specific individual or individuals from 3 years ago etc.

If your employees can generate 10Gb of data during the course of a year, then they can learn how to apply retention principles to it while they do so. Its just one more aspect of the job.

Now there are various attempts at software to automatically filter and organize your data - email and documents etc - according to key words and phrases, email addresses etc. I believe some of them are pretty well evolved and take a lot of the burden off your employees - and cover you when those employees can't be bothered to do what they should be doing according to the rules, but I have no experience with how well these work.

Here's an article on email retention (from a quick google search, no idea how well its written)
http://searchstorage.techtarget.com/tip/0,289483,sid5_gci1212767,00.html [techtarget.com]

Throwing Policies at a Technology Problem? (1)

LittleBigScript (618162) | more than 6 years ago | (#25057229)

What about throwing company policies at a technology problems?

Hypothetically (never happens in the real world of course), what if there was a document management server, samba dropbox, where all documentation for deliverables are kept in portable excel 2003 format? What if content identification is done my creating folders with "project" and "project"_old naming conventions, hyperlinking is done in excel (because html is complicated), and ad nauseum for the automated process called "company policy"?

Load More Comments
Slashdot Login

Need an Account?

Forgot your password?