Bulk Data Storage For The Common Man?

timothy posted about 10 years ago

Data Storage 483

Vigyaan writes "Lately, I have been looking into different bulk data storage options available to a common man. My work depends on generating, storing and analyzing a large amount of data -- averaging about 1 TB per month. I would like to have a storage system which is automated, fast, reliable and most importantly does not cost the price of an eye. Right now, I have a 4 node Linux cluster with 10 large hard disks (total capacity 1.6 TB); data storage roughly costs about $0.60/GB (excluding the cost of PC hardware). But long term storage is painful -- DVDs cost about $0.10-$0.15/GB but takes too much human time and leaving data on hard disks makes me nervous because of possible failures. RAID is a possibility, but it increases the cost significantly. I was wondering, if Slashdot readers have any recommendations for a cheap automated way to store and retrieve data."

Finally a use for my 1GB Gmail invites... (4, Funny)

anakin357 (69114) | about 10 years ago | (#9616501)

I'll send you a couple.

Re:Finally a use for my 1GB Gmail invites... (1) (643709) | about 10 years ago | (#9616568)

send it to me please :)

Re:Finally a use for my 1GB Gmail invites... (0, Offtopic)

Usquebaugh (230216) | about 10 years ago | (#9616655)

You'll need more than a couple, 1TB=1,000GB

Re:Finally a use for my 1GB Gmail invites... (0, Offtopic)

G-funk (22712) | about 10 years ago | (#9616705)

I know somebody who'd appreciate them ;-)

*whistles innocently*

1024 GMail Accounts (0, Redundant)

Lots of Gmail Accounts (-1, Redundant)

GMail accounts (-1, Redundant)

GMAIL (-1, Redundant)

compression (2, Informative)

Suppafly (179830) | about 10 years ago | (#9616575)

First off, if you aren't already compressing that data, start. You may be able to cut the size down dramatically using compression.

Then backup using tapes just like every other place that has to do backups. Generally do full backups once a week and incremental ones nightly or whatever is necessary based on the data you are working with.

spongedrive is best (5, Funny)

cubyrop (647235) | about 10 years ago | (#9616580)

i am responsible for providing storage solutions for a mid-sized content creation company which, through version archiving, accumulates near 1-200 GB per day. they require access to their media backups on a rolling basis, so tapes are not an option.

i have found that a Teutonium cluster of 6.5 TB Spongedrives (either Cray or SecreTech are fine) fits the bill nicely. housed in a 15-unit rack server, the amoeba-shaped drives utilize BioLas technology to store data on 6-dimensional Moebius Cilia for a slick seek time of 0.00 ms.

a cluster costs about $45,000 USD but the price should come down in 2004 Q4 when SecreTech launches their new 40-platter blackholium SCSI's.

Re:spongedrive is best (0)

Anonymous Coward | about 10 years ago | (#9616701)

Any mod who rises to the bait should be shot.

Drawbacks, what are you willing to put up with? (5, Informative)

Anonymous Coward | about 10 years ago | (#9616581)

All forms of media/backups have their own drawbacks... but some aren't as bad as others, and the others often are more accessable.

Tape: Tapes break, they wear, they have dropouts, take a while to back everything up, can't always access files if you just want to restore something (Different methods vary, folks)... but ultimately, it's cheap when you use DAT because they're a common media. Swap the tapes twice as often (and throw old ones out) if you're paranoid about tape related failures.

Hard Drive: Most common form of backup I see now, mainly for the 1:1 size factor. Yeah, drives fail, too. Sometimes you have a pretty good warning when this is going to happen, sometimes you don't. (My 13GB Maxtor and 40GB IBM Deathstar drives both went *pfft* on reboot.) Get enough of them at once, you could swap out the logic boards if one does fry out. Ultimately, RAID or just simple 1:1 mirroring is probably the most efficient and easy method. Accessing bits and pieces is also easiest under this method. I personally just use an external USB2 case with a 120GB drive in it. Everything I want to back up goes on that drive, and then eventually... DVDRs. I turn off the drive when I don't need it... hopefully prolonging the life of it when I need it most.

DVDR: Not anymore. If we had these new-fangled DVDR discs (+ or -) say... when 2 to 6GB drives were common.... sure... But in addition to hard drives, recovering selective files is easy under this method too... Unless you use a backup program that crunches everything together on the disc in some spanning format. Burn times can be tedious... but it's not bad if you consider the overall amount of data you're putting on the disc. Cheaper than quality-brand name CDRs, though, in terms of price per mega/gigabyte. Only an idiot would trust $0.01-per-disc spindles for long-term backups. Even the longevity of DVDR has yet to be seen...

CDR: I'm not going to bother.

Network: Well, still relies on hard drives and other components... but good if you don't want to saddle one room with a ton of boxes. Simply for space and efficiency... external drive is probably better anyway.

Old fashioned method: Print everything out and keep it in a filing cabinet somewhere. You could always OCRA the stuff later. ;-)

LTO Ultrium 2 Tape Drive (4, Informative)

jeffgeno (737363) | about 10 years ago | (#9616587)

The drive will run about $4000, but the tapes are only around $0.20/GB assuming a 1.5:1 compression ratio. And keeping that assumption, 1 TB of data should only take 3 200 GB native tapes per month, so swapping wouldn't be so bad with the single tape drive. An autoloading library would be significantly more expensive, but if you really need automation, that's the way to go.

safest way (1, Funny)

nazsco (695026) | about 10 years ago | (#9616588)

echo 1tb.txt > /dev/lprn0

The "Common Man" doesn't need bulk data storage... (2, Insightful)

syberanarchy (683968) | about 10 years ago | (#9616589)

The best idea I could give you is to just create a sister system, where you mirror all your data. Not cheap, but cheaper than getting a pro-grade solution.

The reason you won't find such things on the cheap is because the average person with a PC doesn't even know what a GB is. He simply goes into the store, the sly salesman says "oh, what do you need it for," and then says "well 60-80 gb should be all you ever need."

Now, contrast that to me - my friends shit when they hear I have a 250 gb drive and a 120 gb drive, as well as an extra 60 gb on a networked machine. They can't fathom ever needing that much space. I know that's probably a pittance by Slashdot standards, but it's true :(

Theft / Physical Damage etc. (0)

Anonymous Coward | about 10 years ago | (#9616594)

The age old problem I've had with RAID is that:

- If the machine gets stolen, there's no backup.
- If the RAID controller shits out and takes a couple of drives with it (uncommon, but has happened), there's no backup.
- If there is a physical disaster and the machines spontaneously combust then their is no backup.

I don't know if there is a cheap solution for what you want...

If anything I would say plugging in hop swap drives just to backup in to your machines, and then take them offsite when the backup is done, as well as RAID if you can afford it....

Duplication Tower (0)

Anonymous Coward | about 10 years ago | (#9616597)

I believe that it would be worth the money to invest in a duplication tower at this point, the ones with the mechanized arms, preferably one you could hook up to your computer.

Why not tape drives? (2, Insightful)

Silent8ob (638046) | about 10 years ago | (#9616599)

Look at what the rest of the corporate world uses for large scale storage management. It is still ruled by Tape drives.

I don't know how much an eye goes for at the moment, but if you can spring for a Super DLT drive you'll get up 320GB (Compressed) for each tape.

It all comes down to the Quality:Cost:Time triangle.

Easy (4, Funny)

Pedrito (94783) | about 10 years ago | (#9616609)

I use bioneural gel packs at a cost of $0.04 per teraquad. What is this hard drive of which you speak?

What you want, for price. (1)

Crasoum (618885) | about 10 years ago | (#9616611)

Well it isn't going to happen, you -HAVE- to drop change for what you want, as a back-up solution. There really isn't any way around that.
There are many plausible suggestions though that won't break the bank totally. One of course is raid as has been mentioned and will be a few times I imagine. But you may also wish to look into hot swappable solutions.
USB 1.1/2.0, Firewire and SATA are all relatively cheap storage solutions if you shop around (Pricewatch [] is a good place if you are willing.). You can convert IDE drives to USB with an IDE>USB box, and buy a few decent 200 gig hard drive for around $120~$150.
Another could be buy a SATA card and some SATA drive and plug them into the front of your case, SATA 2.0 is hot swappable and the hard drive prices have come down into a decent range.

Now another solution is buy used SCSI, and raid those together, reliable fast and not overly expensive if you don't want 15k RPM.

Another idea is buy another box, place a few hard drives in it, and use that box as your back up, but it's a hassle more so then the rest, but as a plus you can place it somewhere else as an offsite backup and all you have to do is plug it in and your work is ready to go (from the place you most recently backed up.)

With incremental back-ups it may not be too bad.

Then again you are moving terabytes.

Of course... (1)

etnoy (664495) | about 10 years ago | (#9616615)

I was wondering, if Slashdot readers have any recommendations for a cheap automated way to store and retrieve data.

Remember the data in your brain, that is much cheaper than buying disks IMO.

P2P Backup (0, Redundant)

billstr78 (535271) | about 10 years ago | (#9616616)

That's easy. Name your files "Hot Lesbian 4-some[0000-1024].mpg" and make it available to the P2P sharing community. Every horny male in the country will then help to download and distribute your data across the country and when you want file number 5 back, just search for Hot Lesbian 4-some0005.mpg and bingo, 10 available good copys ready for re-use. ..

Tape (1)

NitsujTPU (19263) | about 10 years ago | (#9616625)

Buy a tape drive.

If you have money burning a hole in your pocket, but a tape changer, so you don't have to change the tapes.

How much is an eye? (1)

AllNicksWereTaken (741964) | about 10 years ago | (#9616629)

How much does an eye cost?

Hijack Cassini (5, Funny)

Anonymous Coward | about 10 years ago | (#9616635)

... and program it as a repeater.

It's about 90 minutes away, so at 250 Kbps that's over one terabit in storage on the way out there, and another terabit on the way back.

Worst-case access latency is about three hours, though. Maybe the hard disks are a better idea.

If you send your probe^H^H^H^H^H repeater to Alpha Centauri, you'll get more than 20,000 times the storage capacity.

i know you know the smart ass (1)

nazsco (695026) | about 10 years ago | (#9616636)

backups? I upload to an ftp server and let the world mirror it for me

Use those HDDs! (2, Interesting)

jsm008us (774007) | about 10 years ago | (#9616643)

You can get cheap computers from the trash, donations, bulk, etc. You can use that cluster to mirror your data once or twice. I don't know what data you have, but if you have the same data on more than 1 different hard drive, you can be rest assured it will be fine. Or you can just print it all!

The stockmarket is backed up to three (or more?) seperate locations. Look into NVRAM (e.g. flash media) or a cluster with all those hard drives linked together, with a constant backup. With the builtin IDE controller on most motherboards, you can hook up to 4 Hard Disk Drives. If you add SATA, RAID, SCSI, and IDE, you can have lots of hardrives on one machine!

You could also rotate hard drives, so they arent constantly used (making the whole system last a LOT longer!) or replace the drives that are about to fail (which would be at least in 3 years!). Most Hard Drives could probably handle 5 to 10 years no prob (maybe even 20 if they are rotated!).

It all depends on what you have and what you want to do!

Storage solution (0)

Anonymous Coward | about 10 years ago | (#9616645)

Compaq's StorageWorks SSL2020 AIT library is a single or dual drive library that offers 2 terabytes (2:1 compression) of storage in a 4U tabletop or rack configuration. Library modules can be stacked five high for up to 10 terabytes of storage within a 20U space. The SSL2020 is qualified with Windows NT and Windows 2000, NetWare, Tru64 UNIX and OpenVMS operating systems, as well as Compaq ProLiant and Alpha-Server product lines.

Do what Google does (5, Insightful)

glinden (56181) | about 10 years ago | (#9616654)

Build yourself a cluster [] of cheap boxes with cheap IDE disks and replicate your data across them. Because the data is replicated across your cluster, no need for backups or RAID.

Vital information left out (3, Insightful)

dfghjk (711126) | about 10 years ago | (#9616656)

How many months at 1TB/month do you require access to online? After you are done with data can you discard it or do you need it archived? What is the cost of losing your data set at any given time? In what manner do you expect to access it (read/write mixture and sizes plus aggregate throughput and number of client connections). The answers to these questions could cause the cost of a solution to vary but a couple orders of magnitude.

options options, what is your time and data worth? (5, Insightful)

segfaultcoredump (226031) | about 10 years ago | (#9616660)

Lets see.... hard Drives are running about $0.50 per GB, DVD's are running about $0.06 per GB (100 pack, "house brand", not something I'd put my data on but this is slashdot, and there are idiots out there who think that it is a good idea), and tapes are also running about $0.20 -> 0.50 per GB (for the DLT/AIT/LTO type, the ones that have enough capacity to not drive you nuts)

So, you can put your data on 4-5 HD's, 10 tapes or 232 DVD's per month. The Cost of doing so will be about $500 per month for the tapes or HD's and $50 for the DVD's (assuming your time cost $0)

At work, we had a need to keep a few TB of data online permanently, so we purchased a few NexSAN [] ATABeast's. At $50,000 for 10TB of usable storage ($5/GB), they may be a bit out of your price range. The advantage is that you can hold almost a years worth of data and it is protected by RAID5. It also makes management a lot easier, since it is very difficult to mount 42 300G drives in a single chassis (and it takes only 4U of rack space).

On the low end, NexSAN has the ATABoy2 or ATABaby (2TB or 1TB) for the $8-$15K range. This will let you hold a months worth of data

On the high end, You have EMC disk arrays (Think upwards or $20+/GB for the 'cheap' stuff from them.

Overall, if you have 1TB per month, you need to either a) get a grant to fund your work, b) hire somebody to swap DVD's for you or b) seriously rethink your data generation.

Any of the "cheap" storage methods have serious drawbacks, and the low cost ones are, well, not so low cost if $15,000 sounds like a lot of money to you.

otherwise, good luck

If its volume you want (5, Informative)

TheUncleBob (791234) | about 10 years ago | (#9616663)

If you are more interested in volume than speed, then the emphasis should be on the 'ID' part of RAID. Inexpensive Disks. If you used 160GB Drives, which appear to have the best bang for your buck at the moment, and put 6 (yes 6!) in a pc. Just use any old cheap pc (I use 200-400Mhz PII)

Run the disks RAID 5 and you will get about 800GB of storage for $600 . Now get two cheap ata100 cards so you have a total of 6 channels, and mount each drive as a master on each channel. Build a 2gb root partition on the first disk (mirror it if you want) and then set the rest of the space up as a huge raid 5 array.

Et Voila cheap, big server. To archive data, turn off pc, and throw into attic :-)

my solution (1)

stonetemple (97639) | about 10 years ago | (#9616666)

- Several 80-gig drives
- 1 removable IDE hard drive enclosure
- 1 fireproof safe, preferably bolted to the ground or kept off-site (for the particularly paranoid)

Bulk storage (0)

Anonymous Coward | about 10 years ago | (#9616670)

Compaq's StorageWorks SSL2020 AIT library is a single or dual drive library that offers 2 terabytes (2:1 compression) of storage in a 4U tabletop or rack configuration. Library modules can be stacked five high for up to 10 terabytes of storage within a 20U space. The SSL2020 is qualified with Windows NT and Windows 2000, NetWare, Tru64 UNIX and OpenVMS operating systems, as well as Compaq ProLiant and Alpha-Server product lines.

Network Appliance (1, Interesting)

Anonymous Coward | about 10 years ago | (#9616673)

No I don't work for them, but I work with their H/W & their support is second to none. You can get a recon'd R100 for reasonable money. New, they cost ~$100k for 12TB.

They aren't File servers, as they aren't designed for lots of clients. But they are perfect for storing a 'live' backup of data ! They can the Technology Nearstore, its designed to sit between your File servers & your Tape backups

CD Changer (2, Interesting)

andrebsd (685491) | about 10 years ago | (#9616677)

Well, I have a cd changer for computers made by NSM... It's scsi (comes with a 2x reader origionaly) so all you gotta do is find a scsi dvd burner (or a long enough ide cable and convert it, since the motors are all powered by a com port anyway) and replace thd drive, (or like in my case, a cd-rw - had the drive for a while, so at the time a dvd burner would have cost to much) then you have 100 dvd's you can burn data to automatically, and when those are full just swap them out for new ones.

Now the problem is, you can only get 430gig's out of one changer using single layer dvd's... Double would bring you to 970gig's per changer.

Assuming you can get the unit for 100 bucks or so, and the dvd drive costing 100 (69 bucks at frys).. Then you have a 200 dollar backup unit that can store 430gigs of information onto dvd's

Doesn't your company back-up anything else? (1)

Larry Lightbulb (781175) | about 10 years ago | (#9616678)

That amount of data has to be a company, not a common man.

All depends on retension... (1)

HockeyPuck (141947) | about 10 years ago | (#9616685)

What are you retension requirements for this data? 2 months? 1 year? Forever? How often do you need to access this backed up data?

If you requirements are rare restores, then I'd go with tape. You can back up to a tape drive about as fast as a ATA disk and you can move the tapes offsite. Restores w/tape are a bit more painful, but then a tape isn't as delicate as a disk drive.

DLT is the way to go (3, Informative)

pastpolls (585509) | about 10 years ago | (#9616696)

I actually use a DLT with autoloader I got off ebay for under $200. I then bought a lot of used DLT tapes (100) and use them to backup my Video and DVD projects. It is great because when I fill my offline storage (about 1TB) I just fire up the backup software and get the old DLT going overnight. It is done by morning and the shelf life for those tapes is about 20 years.

Ultrium (3, Informative)

7vEn_T_7vEn (794241) | about 10 years ago | (#9616698)

I'm not sure what your budget is but if your like me you want something that complies to standards so it will be around, is cheap and effective. For this I would have to recommend an Ultrium tape backup drive. The drive is standards based (google it) and the tapes are dirt cheap a 200/400 gb tape pulls up for $55. If you figure (hardware compression) 250gb of storage per tape then it will cost just $.22/gigabyte. The problem is that the drive itself is listing for about $2600, not exactly cheap but it's guaranteed to be backwards compatible with future lto standards and the media is as cheap as you could possible ask for. One more thought, look into an LTO Gen 1 solution (100/200) for a cheap drive, cost per gigabyte is roughly the same, it will just take more swapping.

Consider Online Backup (2, Interesting)

jp10558 (748604) | about 10 years ago | (#9616708)

One company that provides massive online backup and storage at reasonable prices is Streamload [] . You might want to check them out.
