×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Costs Associated with the Storage of Terabytes?

Cliff posted more than 11 years ago | from the massive-online-storage-in-$s-per-10e12 dept.

Hardware 161

NetworkAttached asks: "I know of a company that has large online storage requirements - on the order of 50TB - for a new data-warehousing oriented application they are developing. I was astonished to hear that the pricing for this storage (disk, frames, management software, etc...) was nearly $20 million dollars. I've tried to research the actual costs myself, but that information seems all but impossible to find online. For those of you out there with real world experience in this area, is $20 million really accurate? What are the set of viable alternatives out there for storage requirements of this size?"

cancel ×
This is a preview of your comment

No Comment Title Entered

Anonymous Coward 1 minute ago

No Comment Entered

161 comments

320G Maxtor Drives? (1)

Exotabe (601787) | more than 11 years ago | (#4228481)

At about 1$/G of storage, 50TB comes out to around $50,000.

Not sure if this is what you're looking for, but I couldn't help noticing the timing after the last article.

Re:320G Maxtor Drives? (2)

MountainLogic (92466) | more than 11 years ago | (#4228648)

To blow that out into more detail: 50TB / 320 GB = 156 / 4 IDE drives per box = 40 Boxes

Add to that switches, routers, T1s, building, cooling, etc. Now if you need that to be robust you would build RAID units and that could double the cost.

Re:320G Maxtor Drives? (2)

Odinson (4523) | more than 11 years ago | (#4230417)

320 GB = 156 / 4 IDE drives per box = 40 Boxes

I recently picked up a Promise Supertrak 6000. I droped an email to Promise and found out that one machine should be able to run two supertrack (hardware raid) cards. With 6 IDE buses (12 drives) and two cards per machine. You could setup each machine with linear raid and 24 320 meg hard drives. If you did that you would only need seven machines!

Realistically though you would need redundancy in this kind of system which means some lost storage. With raid 5 it would be more like 14 machines for 40+ terrabytes.

Now figure about 1000 bucks for a MB case processor gig of ram etc, and 300 for each card and $350 for each drive * 12. That comes to $5800 a piece. $3500*14=$81,200. If you want hot swap it would be +130 for the card Hot swap package, plus 9 more drives at $80 a pop is $720 per machine, multiply that by 14 and hot swap capibility will cost you an aditional $10080 dollars. That is a hot swap, raid 5 total of 91,280 dollars.

With a few top notch megabit switches with channel bonding, you're looking at $100,000. Can you hire me for the additional $19,900,000? I would like to continue to work in NY.

Re:320G Maxtor Drives? (2)

Odinson (4523) | more than 11 years ago | (#4230661)

My bad, you said 50 TB so it would be $112,500 and $125,000 respectively. Also you could setup an exact duplicate network (lets say hot swap), and fall over based on a heartbeat to rsynced to within seconds on a seperate network, (add another $20,000 for net equiptment) and you would be looking at $270,000 dollars for the twice redundant 50TB bad ass disk array from hell.

I gladly take the $170,000 pay cut for my foolish exuberance to 19,730,000. :)

I love Linux. :)

Re:320G Maxtor Drives? (3, Interesting)

highcaffeine (83298) | more than 11 years ago | (#4228818)

In raw disk storage, maybe. But you're forgetting actually putting those drives into a useable state with disaster recovery plans.

In other words, someone dealing with 50TB and who wants backups of that data will be spending many, many times the amount it would cost to just purchase enough hard drives to get the bragging rights of 50TB. And a backup located in the same room/floor/rackspace/whatever as the source data will be pointless in the event of fire, floods, nuclear fallout, etc. So, they would also need a way to transfer all that data to offsite backups in a timely manner (waiting five weeks for a full backup to transfer over a 100Mb/s pipe would probably not be acceptable).

Aside from backups, how would the drives be accessible? Even as JBOD, you're talking 40 IDE/ATA controllers (assuming 320GB drives and 4 ports per controller), or 20 SCSI channels (assuming 160GB per drive and 15 non-host devices per channel) to support that many disks. You could also use Fibre Channel and get away with only a couple arbitrated loops. Physically, you're talking about hundreds of disks that need to be mounted somewhere, so you would also need dozens of chassis to hold the drives.

But, hundreds of disks in a JBOD configuration means you'll have hundreds of partitions, each separate from the others. Hell, if the clients are Windows machines, they won't even be able to access more than a couple dozen at a time. And even for operating systems with better partition/mount-point addressing, it would be unmanageable.

So, now you get in to needing a RAID solution that can tie hundreds of disks together. If you're talking about hooking these up to standard servers through PCI RAID cards, you'll need several of those machines to be able to host all the controllers necessary (especially if all the disks are not 160GB or larger each).

The only realistic solution for this much storage, at least until we have 5TB hard drives, is a SAN-like setup. Specialized hardware designed to house hundreds of disks in stand-alone cabinets and provide advanced RAID and partitioning features. SANs don't come cheap.

Add to the SAN the various service plans, installation, freight, configuration, management and the occasional drive swapping as individual disks fail and you've already multiplied that $50K several times, as a bare minimum (and you still haven't priced out the backup solution).

There's a lot more to it than just having a pile of hard drives on the floor. I wouldn't even be surprised if the drives are the cheapest component.

Prepare for War! (-1, Offtopic)

Anonymous Coward | more than 11 years ago | (#4229275)

Are you prepared to die for your rights and freedom?

Are you prepared to kill?

September 11th marks a turning point in history. The day the islamic world showed its true face. The terror of that fateful day need not be recalled here, for there are far worthier media outlets that can stage a fitting tribute to the fallen. But the repercussions of this event have been felt for the past 365 days.

Notice how you don't feel as secure as you did in employment. Notice how suddenly your salary doesn't stretch as far as it used to? Notice the lump tightening in your gut every time you see an islamic walking freely.

Now ask yourself? Have you let the terrorists win?

They are laughing at us. The whole islamic world is laughing at us. They will be in our streets tomorrow, cheering and celebrating the act of mass murder. Why are they permitted to support acts of barbarism?

Why? Because we live in a Free Country. Yet, paradoxically, it is our freedom which offends the islamic the most. They know fine well that in their barbaric despotisms they would not be permitted ANY means of protest against the goverment. Yet, not only are they free to protest, to burn our flag, but they are nigh on ENCOURAGED to do so by the liberal media.

Switch on your television set tomorrow. You'll see them. The liberals. Plump and fat with the rich pickings of their well-fed middle class white heritage, they shall be bleating that somehow, it's all America's fault. That America is a nation forged from genocide, that the numbers of children killed in Iraq and Afghanistan far outweigh the tiny little statistic of the World Trade Centre murders. Making us feel guilty for the privledge of being noble-hearted Americans. Denying our nation the opportunity to mourn.

We face a twin pronged attack. The liberal media, forever assaulting the values we hold dear, and the islamic menace. A permanent threat that has been allowed to continue far too long.

Understand that for an islamic, the idea that a non-islamic should be permitted to live in peace is HERESY. Like any good brain-washing cult, islamics are indoctrinated from birth and are forced to remember every verse in their "Terrorist's Handbook", the Koran. Without recourse to other treatises on morality, is it any wonder that islamics have degenerated to the point of raw animal savagery?

As long as islamics are tolerated in decent society, there will always be terrorism. They want to see this 'decent society' destroyed from within. Notice how, although islamic culture is supposed to be a 'paradise', these animals cannot WAIT to get out of their own countries and into Western civilization.

Why is this? Surely, it is a core belief of islam that a muslim should offer shelter to their fellow muslim. So, why is Europe infested with the black cancer of 'asylum seekers'? The answer is simple. Asylum seekers are an invasion force. Entering decent countries and tearing them apart from within. Demanding to be treated with more privledge than the native population. Clogging up government services and squandering taxpayers money without offering anything back to the society they force to become their home. Other than an increase in crime rates, of course.

Surely a western government is in place "For the people"? Ask ANY European Citizen whether they want asylum resettlement centres in their towns, and you'll recieve a unanimous "Non". So why is THIS invasion tolerated?

Socialism! Successive socialist goverments in Europe have allowed the islamic cancer to spread unchecked. This is why, even though islam is a religion, and not a skin colour, every muslim knows how to cry "racist" whenever they are asked to behave like a human being.

For a 'just' and 'tolerant' religion, it is shocking to see how quickly islamic settlement areas degenerate into high crime zones, where the rest of society; be they white, chinese, hindu, sikh, whatever; fears to walk. There are areas of all our inner cities which operate under strict Sharia law in all but name. Gangs of muslim youths roam freely, each one of them a mini Bin Laden.

This is the army which we must face. Together, not as whites, or purebreed aryans, or any of that bullshit, but as AMERICANS, we must stand together. For even the most liberal of human beings knows, in their heart, that islam poses a threat to the very foundation of our existance.

We have guns. We need to use them before further liberalism pries them from our grasps. Because we know that THEY have guns as well. And they are just waiting for the call from their Terrorist Training Camps (mosques) to begin the holy jihad. And it will be the blood of our families, our beloved ones, that will be spilled if we don't act. We must act soon and decisively. Entering the muslim-held areas en masse and eliminating their foul subhuman breed for good. For once our country is purged, and no more islamics are permitted entry, we know we shall be steadfast on the road to security.

The muslim's heart craves war, and on September 11th 2001 they chose to bring the battle onto the streets of America.

September 11th 2002 shall be the day we fight back. Our new Independance Day.

I ask you now, if any word of this diatribe strikes a chord in your heart, PLEASE post it on. Usenet. Online forums. Wherever. Because, despite decades of liberal propaganda, you know that every word said is forged from the cold, harsh flame of truth.

Metacomment (4, Insightful)

twoflower (24166) | more than 11 years ago | (#4228536)

Why is it that 90% of "Ask Slashdot" pieces seem to boil down to "I have no real world experience, and I'm just wondering how I can solve problem X for Y dollars when twenty different vendors all sell solutions for 100 * Y dollars?"?

Meta-Metacomment (2)

fm6 (162816) | more than 11 years ago | (#4228728)

Because it makes a nice change from the "I'm stuck, somebody tell me what to do!" pieces.

Re:Metacomment (2)

anthony_dipierro (543308) | more than 11 years ago | (#4228792)

Why is it that 90% of "Ask Slashdot" pieces seem to boil down to "I have no real world experience, and I'm just wondering how I can solve problem X for Y dollars when twenty different vendors all sell solutions for 100 * Y dollars?"?

It makes readers feel superior, and keeps them coming back.

Look at the quantities (2)

0x0d0a (568518) | more than 11 years ago | (#4228803)

Because in this case, it's pretty obvious that the prices are overly inflated. He's paying almost a thousand times what the raw drives go for.

I think it's pretty reasonable to feel that you could put something like this together for under $100K.

Re:Look at the quantities (1)

Hieronymous Cowherd (11195) | more than 11 years ago | (#4229930)

I'd call you mighty confused. Hell, at 50TB, you're in the range requiring director-class SAN switching, and a pair of those by themselves blow your $100K budget. Please get some real-world experience before holding forth on things you really don't understand. You'll pay more than $100K for the software to manage that size of a storage farm. Remember, we're talking HA design for your SAN, and a *serious* backup system. Oh, you were going to skip that? Keep your resume ready. "Uh, Mr. CEO, it'll just be a month while we rebuild and restore your data warehouse..."

Re:Look at the quantities (4, Informative)

Neck_of_the_Woods (305788) | more than 11 years ago | (#4230748)

That 100k was a joke right? We have 4 2tb SANS where I am and I can tell you that any 2 of them would eclipse your guess. Lets not get into the shelf disks, the extract fabrics, the Raid eating some of your space. Opss did I forget the support contract, the ups the size of a cubical, and a libert air conditioner to cool this room full of spinning drives? Wait minute, your going to need full redundant backups for all this shit, the Gbic switches to controll access, the rack space, and all the fiber hba cards for the servers.(unless you go coper).

Then you want to back this up? Break out your checkbook again for a Compaq minilibary if your lucky, that is only 10 tapes x 80gig a tape...800gig..and that is if your really doing well. So put that on top of it all 10x10X80 gives you 8 TB of backup at around 30k each for the minilibs, the price just keeps on jumpin!

No way, no how, not today or tomorrow. 100k will get you a floor full of 120gig maxtor drives and that is about it.

I'm not saying that this is the standard... (4, Insightful)

0x0d0a (568518) | more than 11 years ago | (#4231484)

...I realize that accepted pricing is well above the price I mentioned. And yes, obviously I left out the maintenance.

The problem is that I find that corporate spending on IT purchases has gotten ridiculous. Let's buy a TEMPEST array! Let's buy something with a Sun nametag because the name sounds good! Let's buy a $2k piece of software for each workstation even though there's a free alternative!

I'm not saying that anyone *provides* something in the price range I was talking about. No one is crazy enough to do so, if companies are willing to pay much, much more. I'm saying that, if you're asking whether it's possible to *build* something like this for the price range I mentioned, off the cuff it doesn't sound so unreasonable.

Yes, a seasoned IT person who works with high-end systems like this will laugh. Why? Because they're used to paying huge amounts of money. Because it's an accepted part of the culture to throw down this much cash. What I want to know is -- how often do people question these basics? How often has someone said "Wait a minute...this is wrong."

Are you telling me that if you were in a third world country without the exorbant amount of funding that we USians enjoy, and someone asked you to put together a 50TB storage system for under $1M, you'd simply say "It can't be done"? No consideration, nothing?

I mean, when I look at the fact that the *case* on, say, a Sun high end system costs more than a whole cluster of workstations, I start to wonder just how much excess is going on here.

Say we take the bare-metal, dirt cheap approach. Grab a bunch of Linux boxes. Throw RAID on them configured so that 1/3 of your data is overhead for reliability, and a 100Mbps Ethernet card in each. The figure used earlier was $1 per gig. Put 6 200 GB drives in each. Throw down $250 for the non-drive cost of each system. You have 800GB of data on each system, 400GB of overhead. That's 63 systems. $16K for the systems, $75K for the drives, and we come in to $91K. I left out switches -- you'd need a couple, but certainly not $9K worth.

You'd need some software work done -- an efficient, hierarchical distributed filesystem. I didn't factor this in, which you could consider not fair, but there may be something like this already, and if not, it's a one time cost for the whole world.

Maybe another few systems up near the head of the array to do caching and speed things up, and you still aren't even up to $150K, and you have failover (at least for each one-drive-in-three) group.

I haven't looked at this -- it might be smarter, since you'd want to do this hierarchically, to have caches existing within the hierarchy, or maybe Gbit Ethernet at the top level of the hierarchy. And obviously, this may not meet your needs. But as for whether it's possible to build something like this for that much money? Sure, I'd say so.

Finally, existing SANS or any sort of network-attached storage are overpriced, no two ways about it. Very, very healthy profit margins there. Sooner or later, someone is going to start underselling the big IT "corporate solution providers" and is going to kill them unless they trim margins by quite a bit.

Re:Look at the quantities (0)

Anonymous Coward | more than 11 years ago | (#4230986)

Sure, if you want your IDE drives to fail in less than a year.

No geek is an island. (2)

mellon (7048) | more than 11 years ago | (#4229949)

I've read a lot of books in my day, but quite frankly, most of what little knowledge I have comes from the kindness of people who have helped me to learn.

I don't think there's any excuse for asking a question without first doing a little basic research, but here we have somebody who has legitimately never had any experience with terabyte storage asking if there's a cheaper way. It's a legitimate question, and one that probably could not be answered by looking in a book. So the person here is right to ask, and has already gotten some very good answers.

I have a somewhat similar problem: how do I make sure that on the order of a terabyte of audio and video data survives the next hundred years? This given that the disk on which the first 80 gig of this data were delivered to me has two errors that have corrupted two of the files already, and the data isn't even a year old.

What I've been doing is asking other people how they've solved the problem, and also thinking about it on my own. It's how problems get solved. I've gotten some very good and thoughtful answers to my questions already.

Re:No geek is an island. (0)

Anonymous Coward | more than 11 years ago | (#4230981)

1) stop buying IBM drives.
2) I have no clue...

Datawarehouse (1)

jodio (569370) | more than 11 years ago | (#4228542)

We are looking at something similar but smaller 20TB and the price we are looking at is around $2,000,000 Canadian. The price sounds a little steep to me.

Re:Datawarehouse (2)

ericman31 (596268) | more than 11 years ago | (#4231148)

I think you're $2 million price tag is low personally, for 20 TB. I have architected several data marts and data warehouses. The price for small to medium SAN's (say up to 5 TB) is about $150/GB, giving a price for a 1 TB SAN of $153,600, or just under $800,000 for 5 TB. Once you get over 5 TB the technology changes dramatically. Things that are part of the SAN cost:
  • Disk Arrays
  • Fiber Channel infrastructure (i.e. switches, HBA's, etc.)
  • Tape Libraries
  • Tapes
  • Storage Management Software
  • Backup/Recovery Software
  • Disaster Recovery
  • Ethernet and Fiber network management tools
  • Raised Floor space, power, air
  • People costs, including consultants
IF you use Tivoli Storage Manager for your backup/recovery solution (it uses the least tapes per GB backed up of any solution) you will need about 500 LTO tapes, at an average cost per tape of $110. That is $55,000. A tape library that can handle that many tapes online will cost you about $400,000. The software will cost you over $100,000. You see how the numbers start adding up? Throw in consultants at $300/hour (this isn't a skill set you pick up over night). 16 port fiber switches with GBIC's will cost you $25,000 each, how many of those will you need? Or do you need Director Class switches (likely), better quadruple that price for the switches. HBA's are $1500 a pop, you need two in every server, minimum, for redundancy. Your disk arrays have to extremely fast to keep up with the demand for data from the servers, or you will be I/O bound. We aren't talking about MaxAttach NAS here.

You get the point I hope. $20 million is probably reasonable actually.

Re:Datawarehouse (1)

thempstead (30898) | more than 11 years ago | (#4231237)

nah, a tape library for the above would be a bit cheaper than that (not including of course the necessary maintenance contracts) .... 500 tapes a bit low to allow for on and offsite storage pools for the situations where you need quick restores and DR scenarios ...

Other than that I agree totally ...

If you take (4, Funny)

Apreche (239272) | more than 11 years ago | (#4228607)

the new 320 gigabyte harddrives previously mentioned. And you divide 50000 (50TB) gigs by 320. you get an approximate cost of having 50TB by multiplying that by 350$ the appoximate cost of the drive. However, with that much data a RAID is certaintly in order. So multiply the number of drives by 1.5 or 1.75 to get the number of drives needed for a RAID. Then multiply that by 350. This comes out to a little over 80000 dollars. The only cost left is the cost of all the raid controllers (expensive) and networking all the drives together. So for the raw storage of 50 terabytes it costs about $80,000. If you were to buy ultrafast scsi drives instead of the 320GB drives the price will be multiplied by about 3 since a 100MB super fast scsi drive is also about 300$ with 1/3 of the space. So that brings it to $240,000. Add to that the cost of labor and all the other hardware and I don't see how it could come out to more than 1 million dollars. I'm not an expert, but just doing the math it seems that more than that is too much.

Re:If you take (2)

battjt (9342) | more than 11 years ago | (#4228674)

OK smarty pants. Why aren't you out there selling these systems? You apparently would be making $19M a pop, unless you have no clue what you are talking about, then it might be a bit risky.

Joe

Re:If you take (2)

anthony_dipierro (543308) | more than 11 years ago | (#4228759)

OK smarty pants. Why aren't you out there selling these systems?

Because of managers who think like you.

Re:If you take (3, Insightful)

battjt (9342) | more than 11 years ago | (#4229031)

What's this have to do with managers? Why don't you sell these systems? I don't, because I don't know what is takes to build them.

How do you even strap 50 TB together? Is it one huge array, or arrays of arrays?

What do you use at the head end that can handle this sort of throughput? How do you back it up? How do you search it?

What filesystems do you use that support 50TB?

How do you manage the hot swap aspects?

There are so many questions that you leave unanswered, that you might spend $19 mil to answer before you spend $1 mil on hardware.

Joe

Re:If you take (2)

anthony_dipierro (543308) | more than 11 years ago | (#4229295)

What's this have to do with managers?

Managers are the ones who make the purchase decisions. They tend to buy from large name companies with big marketing budgets regardless of the quality or cost of the solution.

Why don't you sell these systems?

I don't have enough money to market them.

How do you even strap 50 TB together? Is it one huge array, or arrays of arrays?

As with all your questions, depends on the needs of the customer. If you're interested in buying a solution from me, let me know, and we'll talk further.

There are so many questions that you leave unanswered, that you might spend $19 mil to answer before you spend $1 mil on hardware.

No, I won't spend $19 million answering a few simple questions.

Re:If you take (2)

battjt (9342) | more than 11 years ago | (#4230372)

Managers are the ones who make the purchase decisions. They tend to buy from large name companies with big marketing budgets regardless of the quality or cost of the solution.

Sorry, I assumed that you meant people managers. If people managers are building systems for your then your company need fixing. What do you call the people that manage people? (Unless I am mistaken and we are both talking about people managers, in which case, what do the rest of you do, if you aren't doing the work?)

As with all your questions, depends on the needs of the customer. If you're interested in buying a solution from me, let me know, and we'll talk further.

Actaully, you have already stated that you could build a 50TB system for $1M, so what more information would you need?

On a more serious side, I am interested in building a dual processor Linux workstation. I do Java/web programming, run VMWare with an instance of W2K connecting to clients via VPN software and and possibly other VMWare instances with Linux as test clients. I constantly have Mozilla, StarOffice, emacs and a couple xterms running. I want to use video conferencing and instance messaging. Can you help me spec a system? Last time I tried I ended up with junk hardware.

Lack of capital (1)

hackwrench (573697) | more than 11 years ago | (#4229640)

Lack of capital, plain and simple... that's my answer to this sort of question

Re:Lack of capital (2)

battjt (9342) | more than 11 years ago | (#4230441)

I have access to the capital. What do you need?

Explain your customer needs and how you are going to satisfy them and why you need the money now. If it all adds up, I can find you the money.

Capital is never a problem, it is an excuse.

Joe

Re:If you take (1)

Exotabe (601787) | more than 11 years ago | (#4228750)

Your math was a bit off on one point. 100MB SCSI drives are 1/3000th the size of 300G IDE drives, not 1/3.

That would leave you with a price tag of... $240 million, yikes. Maybe you could get some sort of savings for buying in bulk :)

Re:If you take (0)

Anonymous Coward | more than 11 years ago | (#4230939)

He meant GB not MB

I mean Jebus, who the hell gets a 100MB drive these days ? Why not just through in another stick of RAM and get 5 times that ? 100 MB is 3 floppies (standard floppies, not the expensive ones) if you have an LS-240 drive.

Re:If you take (1, Offtopic)

jhines (82154) | more than 11 years ago | (#4230376)

You then have the problem of housing, and cabling up over 400 drives.

I can't thing of any individual system that can put many dozens of IDE drives into a single raid volume. You might get several dozen, but not several gross.

At that level, one is going have to pay the bux for real iron.

Re:If you take (3, Insightful)

secret_squirrel_99 (530958) | more than 11 years ago | (#4230965)

You've made a number of assumptions none of them good. One assumption is that the performance of a 5400 rpm ide drive (thats all the 320Gb drives are) would be acceptable for an application like this. It won't. You'd want 15000 rpm scsi-3 drives at a minimum, and you'd want them hotswappable. Figure a grand each for 140Gb drives.. in bulk Then there are a large number of other factors mentioned by others here. Raid controllers, servers to house it all, switching, cabling, racks etc.

What about power? and cooling? Ever cost out one of those huge liebert internal cooling systems? Don't forget you need 2 of them? What about the power.. you'll need huge UPS's for something like this.

How about backups? You'll need to be able to back this all up.. and transport the data offsite in a timely manner. Thats ALOT of DLT tapes, not to mention the costs of the tape libraries, drives, off-site storage facilities (perhaps you'd like to keep all of thos tapes in a locker at the space place? ) etc involved .

Now.. how are you going to access this? with 500 partitions? or perhaps you want some more sophisticated storage management software?

What about support? Are you going to accept responsibility for mainting this thing? or are you.. like most businesses going to want 24x7x4 support? Since support on products like this often involves flying an engineer in from out of state.. on almost no notice.. its not cheap.

The reality of this is that for that kind of storage you need a SAN and that means big dollars. The 2 most commons SANS are EMC (which I'd bet was what this estimate was for) and Compaq storage works. EMC is the more mature solution, but also MUCH more expensive. They often outpace Compaq and the other vendors who make similar products by %300 or more.

Is $20M too much?.. probably. Is any solution involving a room full of servers loaded with commodity IDE drives acceptible.. absolutely not.
Better to shop other EMC vendors, and other SAN solutions and make the best deal on the right product.

Sounds reasonable (1)

DeadSea (69598) | more than 11 years ago | (#4228615)

1 terabyte = 1024 gigabytes so you need 50,000 GB.

From an earlier slashdot story, you can get 300GB hard drives for around $1 a GB. So you are looking at spending $50,000 on hard drives. Figure 4 IDE drives per computer and you need about 50 computers. That would run you maybe $15,000 at around $300 per computer.

I'd say it would need 10 employees to set it up including a couple programmers, a couple sysadmins, and some techs, would probably cost you $200,000 if it took them four months.

I'd say you could do it for less than half a million. Throw in $150,000 a year for facilities and maintence and you have no worries.

Google does something like this. They have tons of cheap computers with cheap hard drives.

Re:Sounds reasonable (5, Insightful)

duffbeer703 (177751) | more than 11 years ago | (#4228780)

Get a clue man.

Where is your failover?

How are you going to connect this disks together? NFS? Samba? That kind of speed (or lack of) is not an enterprise storage solution.

How do you replace disks as they fail without taking stuff offline?

Re:Sounds reasonable (2)

Hard_Code (49548) | more than 11 years ago | (#4229131)

I know, IDE? Who the hell is using IDE/PC hardware for production data warehousing?

Re:Sounds reasonable (1)

questionlp (58365) | more than 11 years ago | (#4229746)

IIRC - the EMC content addressed storage system, the Centera, uses IDE hard drives instead of more expensive SCSI hard drives. Of course, they include additional levels of redundancies to avoid data loss if a drive fails.

EMC Storage Options (1)

justanyone (308934) | more than 11 years ago | (#4230593)

I participated in a Data Warehousing project at a fortune 100 retailer that we anticipated could grow to over 10 Exabytes if we threw all our data sources at the same DB. It would have kicked butt, albeit at great (probably non-justifiable) expense.

We figured on prices similar to the ones above, though somewhat inflated as this project was several years ago. The problem was EMC.

I worked with a systems engineer who had headbutted management for years over EMC. EMC has NEVER allowed a head-to-head comparison between their products and any competitor including the retailer I worked for. In our case at least, apparently any time he got his technical managers to get close to requesting a comparision between "in-house" EMC systems and normal DEC Alpha / Compaq drive systems , EMC would get wind of it, call everyone with any power, and invoke the 'strategic relationship' and 'technical partnership' phrases. Management would always falter under such onslaught, so bamboozled they couldn't tell which end was up. The comparisons never went forward.

We did some preliminary comparisons ourselves, reading and writing a several hundred gigabytes of data using a small C program that SEQUENTIALLY read and rewrote data. The EMC was about the same speed as the standard drive system (slightly slower, but not much) for sequential access.

The comparisons were VERY IMFORMATIVE when we read and wrote RANDOM data. EMC was an ABSOLUTE DOG (very, very slow). The problem was that EMC uses a 32K byte buffer because of its mainframe history, so each record we read (a 1 kb record) incurred disk read penalties like we were reading 32 kb.

Further, we learned by rumor that EMC employs 'read-ahead' software that tries to anticipate the next read location and fills the multi-GB buffer with disk data if it detects a sequential read. Since we occassionally had 2 or 3 sequential reads in the middle of our data (the nature of our data made this happen occassionally), the disk array would apparently go hog wild filling buffers for sequential reads it thought we would use but did not.

The final point was that although EMC had good prices initially, they apparently RENT their equipment (you never own your own hardware), so the prices for upkeep/next-years-rental can spiral up at their salesperson's whim.

That's my 5 cents here. Please be aware these were unofficial studies performed during spare cycles by probably incompetent persons including myself; any correspondence between the truth and the above remarks is purely coincidental. This post provided for entertainment purposes only, please don't sue me, I'm a worthless nobody.

Re:Sounds reasonable (3, Informative)

aminorex (141494) | more than 11 years ago | (#4230081)

The trend is to use iSCSI on the network side and IDE on the hardware side. Since a network file
server only has FS daemons doing I/O, and the drives
are always hot, there is no SCSI advantage as there
is in a multitasking workstation environment.

Re:Sounds reasonable (2)

aminorex (141494) | more than 11 years ago | (#4230281)

Oh, and I might also add the SerialATA, with it's
tagged command queueing, is very shortly about to
render the 300% SCSI price premium obsolete in all
but a few narrow verticals.

Google and Commodity Computers (2)

fm6 (162816) | more than 11 years ago | (#4228815)

Google does something like this. They have tons of cheap computers with cheap hard drives.
Your logic is sound, but Google is a bad example. As I understand it, they mainly rely on keeping their indexes in RAM. There was a good Slashdot story on this a while back, but I can't seem to find it: "Google" is not useful search term!

Re:Sounds reasonable (2)

Ioldanach (88584) | more than 11 years ago | (#4230732)

Ok, lets try running some harder numbers. Lets say we RAID a set of RAID arrays. Not terribly efficient, but we're going on the cheap here. A Promise Ultratrak RM8000 8-CHANNEL External Raid with a SCSI interface [promise.com] is priced at [nextag.com] $2400 ea and can handle 8 drives. For this I'll assume the cheap configuration of 6 data drives, a raid 5 parity drive, and a hot spare. I'll also assume that we'll use the yet-to-be-released 320 GB IDE HDD at $300/ea. Given that, we'll need 26 arrays (for a total of 49.9TB). Add in a pair of spare arrays, and we have 28 arrays. (Hot spares in the raid configuration, though I'm not setting up a parity array in this case. The arrays should be sufficiently stable already.) That said, we have 28*8=224 drives @ $300 ea for a total of $67,200. 28 arrays is, oddly enough $67,200 as well.

Now, those 28 drives will need to be attached to something. Maybe an Adaptec SCSI RAID 5400S [adaptec.com] , which is a four channel card that can accept up to 60 drives and is priced [nextag.com] at about $900. Add to that a machine to put the RAID card in with at least GB ethernet, at around $6000, 3 40U racks at $2000 each and a UPS for each rack at $2500 each.

All told, that's $67,200 each for drives and arrays, $900 for the SCSI RAID, $6000 for a single box, $6000 for racks, $7500 for UPS's, at a sum total of $154,800 for a single 50TB array. Primary point of failure is the single box running it. For a backup system, running a full second array as redundancy would cost a net $309,600. All of this is not inclusive of labor, which for setup might run easily $100k. Thus, a redundant reliable RAID solution would run you $400,000. All that's once the 320GB IDE drive is released by maxtor.

Does that answer your question?

Please note, this won't be the best array money can buy, just a large array on the cheap. (what RAID was intended for)

Re:Sounds reasonable (0)

Anonymous Coward | more than 11 years ago | (#4231045)

you can get more than 4 ide drives in a computer...

if you want to raid them then you can only have one drive per ide chain...
my server currently has 7 IDE devices in it and will have 9 by december.

the system disk and the cdrom are on the same ide chain. the other disks are part of a software raid and each is on it's own chain, which means i need to have 2 ide controller cards in my box. and i will be adding a 3rd card this winter.

given a machine with 5 pci slots it should be possible to get 20 ide devices (no raiding) given 5 ide controller cards + 4 ide devices on the mother board

that is 24 ide devices. the next problem is simply finding the space to put all the drives and a powersupply large enough to support it...

Your pricing is a little old (1)

Speedy8 (594486) | more than 11 years ago | (#4228622)

I don't think it still costs $400,000 per terabyte for a 50 terabyte system when my server has a terabyte of storage for about $3000 total. I think to get a good pricing structure, you need to give the speed and size requirments.

WAY too much (1)

Taral (16888) | more than 11 years ago | (#4228630)

From experience, I know that around 30TB is about $1M. I can't see how 50TB is more than that...

(The 30TB came from IBM.)

more input needed (3, Insightful)

tchdab1 (164848) | more than 11 years ago | (#4228662)

It's more involved that how many bytes you need to store, of course. How fast do they come in and go out? How often do the bits turn over? How reliable does the data need to be, and how fresh the reliability (do you need to mirror it real-time at a remote, hardened site, or back it up once a month)? What systems does the data need to feed and be fed from? What are your labor costs (tape changers, administrators, etc.)? How much wood do you need to buy for office furniture ?

forget what you know about ide hard drives (5, Insightful)

aderusha (32235) | more than 11 years ago | (#4228741)

sorry for sounding a bit trollish, but the current replies here seem to follow the formula of checking the biggest ide drive on pricewatch and multipying that out to give you a number.

forget all that.

if all you wanted was a pile of ide hard drives, maybe this would be ok, but anybody looking for 50TB of storage is not just looking for some disk to hold the pr0n they downloaded last week. large scale storage systems need to manage multiple host access to high speed (15krpm U3SCSI) drives in flexible raid configurations with maximum redundancy, high speed caching (with GBs of RAM to do it), fiber channel switching, cross platform capability, high end management and monitoring, HSM backup and data migration, offsite vaulting of disaster recovery data, power and air conditioning, and a fat service contract from the vendor. none of the above are going to be found at pricewatch.com.

your best bet is to talk to multiple storage vendors about your needs. call up EMC, Hitachi, IBM, and Fujitsu to start, them let them see each other's numbers. With the amount of money that you are going to spend (and it almost certainly will exceed $10 mil - but maybe not $20), each of these vendors will do backflips to get your business (and EMC is particularly good at junkets - take them for all they're worth :)

Re:forget what you know about ide hard drives (2)

0x0d0a (568518) | more than 11 years ago | (#4228853)

...but anybody looking for 50TB of storage is not just looking for some disk to hold the pr0n they downloaded last week. [clipped list of buzzwords]

Yeah, but there's also a tendancy to try to sell ridiculously overpriced products with vague promises of reliability or quality. Name brand vendors do it all the time. If the vendor is really so sure that this stuff isn't going to fail, will they pay damages if something does fail in the next seven years? Mmm? I'd assume that such a guarantee, since they're so certain, should cost you a *nominal* amount. If they expect one in ten systems to violate their guarantees (which seems pretty egregious to me), they should only be jacking the price by 10% at most for that guarantee.

Re:forget what you know about ide hard drives (1)

aderusha (32235) | more than 11 years ago | (#4229026)

vague promises of reliabilty or quality is what you will get from knock-off vendors selling you IDE RAID packs. IBM, EMC, et. al. will monitor your systems 24/7, replace parts before you know they are broken, send techs on site at the drop of a hat, and generally hold your nuts - which makes sense given the amount of money you're spending on their storage.

Re:forget what you know about ide hard drives (1)

m00by (605070) | more than 11 years ago | (#4229070)

and if they really wanna get "user friendly" a.k.a. LAZY =D they could go through dell...they marked a nice line of EMC products that both dell and EMC support...they have a new toy called the CX600 I think...it looks like a dell server, but acts like a nice little 17.5Tb SAN component...buy a nice rack, 3 of these, and something to manage your SAN, and you'd be set. probably not cheap, but easy and reliable.

Re:forget what you know about ide hard drives (2)

haplo21112 (184264) | more than 11 years ago | (#4229327)

Yep....
Thats actually a rebranded EMC Clariion product that was just released. Saw a demo great machine, but think 3 side by side racks, they stand about 5 feet tall I seem to remember.

Re:forget what you know about ide hard drives (1)

m00by (605070) | more than 11 years ago | (#4230488)

on the website here [dell.com] it seems fairly compact... :) and dell makes a nice presentation...they presented at work the other day for us their server line (they'll do that when you want to order lots of hardware [20 servers in a few months] from them...) the other day...they had a really cute little projector hooked up to their laptop with a (laptop)vga->dvi(proj) cable...it was p1mp =D so yeah...they cool ;) except the install...I hate their automated installs...pain in my A$$! give me an HP, which just gives me a raid driver and lets me install =D anyway....

Re:forget what you know about ide hard drives (2, Interesting)

chris_mahan (256577) | more than 11 years ago | (#4230566)

I work at a bank. I understand about reliability and failover etc.

What we need is some university/some poor souls with money to invest, to build this as a "test case" for linux distributed systems.

=============
Requirements:
-- 50 TB Data storage
-- 100% availability (I don't mean 99.99_)
-- Data must be accessible worldwide
-- Data must be safe in these events:
-----War or terrorist act (building blows up)
-----Earthquake (building falls down flat)
-----Fire (building burns to foundation)
-----Flood (building full of muddy fishy water)
--Data must be online in the event of a disaster in 48 hours.
--Data must survive:
----Server failure
----Storage medium failure
----telecommunication failure (junk through the pipes)
----Unauthorized access (r0x4H 31g00G)
----Vandalism (maintenance guy with baseball bat or axe)
----Theft of equipment
Furthermore:
--Data must always be in a non-corrupt state
--Data must be fully auditable
--Data transaction must always be fully reversible
Also:
--All procedures (ALL) must be written down on electronic document and on paper and must be available to ONLY the proper personnel.
--All personnel must be correctly trained (development of training material, testing, evaluations, etc)
--System architecture must allow for connectivity to any known server system, any database system, and any client systems.

===
Oh, and under 20 million dollars.
===

However which way that solution should be implemented is left as an exercise to the reader

Google is your friend (3, Interesting)

Twylite (234238) | more than 11 years ago | (#4228747)

I am not an expert in this field, but Google was willing to tell me lots.

RaidWeb [raidweb.com] sells rack mountable RAID units that take IDE drives and have SCSI or fibre connectivity. A 12-bay 4U SCSI (with 12x 120Gb IDE drives) system comes in at just under $8000, giving over 1Tb fault tolerant storage. There are several other companies that have units like this.

Rackmount Solutions [rackmountsolutions.net] sells rackmount cabinets. A 44U cabinet with fans, doors, etc. will come in at around $3000.

In theory, a single cabinet could house 11Tb of data, and cost around $91000. This still doesn't consider cabling, cooling, power distribution, networking, a proper server room (air con, false floor for cables, access control), and in all likelihood one or more controlling servers.

More practically, depending on how they are going to make this data accessible, you could be looking at 9 raid units per cabinet plus 3 2U servers and a switch in the remaining space. Each server can support multiple SCSI cards and gigabyte networking. Such rackmount computers will set you back in the region of $6000 (incl. network and SCSI adapters, excl. software).

So you can call it $100,000 for 9 Tb storage ... $600,000 for 54Tb. That doesn't answer the management software question, and may not be a suitable solution. But it sure is a lot cheaper than $20 mil ;)

Re:Google is your friend (1)

Hieronymous Cowherd (11195) | more than 11 years ago | (#4230137)

Remember, anyone working with large systems generally won't take a risk on an untried vendor. When you're in the 10+ terabyte range, companies will generally only spring for first-tier vendors.

What's RaidWeb/Raidzone/etc's record on 24x7 service? 3-hour response? Do they replace drives? Once you get to that size, you're having *regular* drive failures, up to one a week or more. Also, look at access speed with those 160GB drives. How do these controllers handle repairing failures under load? What's their cache behavior under varying/extreme conditions? Do they suffer from the "cache hits high water mark under high load, then dumps the whole thing, slowing access to a crawl" problem?

The big guys can afford to do some of this testing, and can answer those questions. The little guys generally give unsatisfactory answers or can't answer the question.

I work with large systems every day (10TB-250TB), and the requirements and methodologies are *completely* different from small or workgroup systems that the average /. reader has experience with. When the system you're working on is a sizable fraction of the essential data of a corporation, and a sizable fraction of their budget, they're a little less likely to spend money on something that's not being used in a similar situation.

Re:Google is your friend (2)

Twylite (234238) | more than 11 years ago | (#4231653)

As I noted in my post, my suggestions may be totally unsuitable for the application - but then there was little information given about the nature of use that is expected. I'm glad to hear an opinion from someone experienced with the other end of the scale.

In my (limited) dealings with data warehousing systems, performance has been a non-issue. These systems have mostly held historical data for occasional retrieval. Often the (in)efficiency of the database system has been the bottleneck. We're talking 10 to 50 TPS on simple queries or blocks or data.

As opposed to the solutions offered by today's high end vendors, RAID takes on its true meaning in this sort of slower-moving-data system. Commodity hardware means less stability, but the warehouse is not a mission-critical system (it can take occasional, short outages). On the other hand it is cheap and easy to replace.

Did you try google????? Obviously not. (2)

walt-sjc (145127) | more than 11 years ago | (#4228758)

Search terms: IDE Raid Chassis

Sponsered link: raidzone.com

Their 4U 2T system goes for $25K, so 50T would be about $750K and fit in 2 1/2 racks. They claim that they will be doing iSCSI soon, but right now it's just NAS. Still, this is a far cry from $20M. If budget is a concern, you can figure out how to use an array of NAS in place of a SAN.

If you are hell-bent on SCSI or FC, you are going to be into serious dough as SCSI drives are almost 10X the price of IDE at this time, and don't come with as large of capacity (which means that you will need more rackspace, chassis, power, etc.) $20M is probably not too far off. Modern IDE drives with dedicated smart controllers are really not too bad. Just keep a pile of them to swap out bad ones as you are going to be going through drives pretty quick.

With the size of your drive array, backup is going to be a serious issue. You are going to need a multi-drive robotic array of good size. Those are not cheap either.

real world (2, Insightful)

pizza_milkshake (580452) | more than 11 years ago | (#4228795)

i think that probably works out about right. of course, 50% of /. geeks will say "well i could through a big old RAID together in my basement for $50,000" -- but in the real world you'll need people who know what they're doing to design it, you'll need to purchase all the disparate equipment from different vendors, assemble, test, repeat. you'll need a place the put the thing, whether it be colo or onsite. after all that you'll need people to maintain it.

i don't know nearly enough to put such a thing together, but i do know enough to know that every real-world project probably costs 50x what a geek-fantasy basement equivalent would cost.

Pricing sounds a little high (5, Informative)

speedy1161 (47255) | more than 11 years ago | (#4228851)

From experience (with EMC - Sun) your price tag sounds a bit on the high side, but not by very much. Considering that EMC storage (after all mission critical data should be stored on EMC/Hitachi/StorageTek, NOT on consumer IDE) costs much more than consumer IDE/SCSI (25 - 75x) and that's only the disks.

If you're going with EMC, you'll need to put those disks in something, like a frame (cabinet), and for your size, more like 5 cabinets. With that many cabinets, you'll need some sort of SAN switch and associated fibre cables (not cheap). That gets your disks into cabinets and all hooked together.

You wanted to access the data? Then you'll need EMC fibre channel cards ($15k a pop for the Sun 64bit PCI high end jobs). But you'll more than likely be serving data from a cluster of machines, so count on buying three ($45k) per machine (so each card is on a different I/O board hitting the SAN switch, redundancy)

Who's going to set this up? For that kind of coin, EMC (or whomever you go with) will more than likely set the thing up and burn it in for you on site. The price probably also includes some kind of maintenance contract with turn around time fitting the criticality of the system.

Yes, my 'big ass storage' experience may be limited , but I think that 20Million for 50TB installed/supported/tested by a big storage vendor is in the ballpark.

Good luck.

Re:Pricing sounds a little high (2)

metacosm (45796) | more than 11 years ago | (#4229567)

The above post is clueful and correct -- read it -- also -- the firm that gave you a ballpark quote for 20 million should be able to give you a breakdown and information on vendors and such -- it should be very possible to track down the prices they are getting the product for -- and how much they are marking it up.

They have to make a living too ya know.

Re:Pricing sounds a little high (0)

Anonymous Coward | more than 11 years ago | (#4229897)

so is most of the cost spent in making sure there is no bottlenecks in your expensive hardware. and is the other half spent in making sure someone else is to blame when the data gets lost or goes to bit heaven?

Re:Pricing sounds a little high (4, Informative)

Wanker (17907) | more than 11 years ago | (#4230133)

My "big-ass storage" experience is not so limited, and speedy1161 has hit the nail right on the head.

For enterprise-class storage (i.e. this is NOT just a pile of Maxtor IDE drives duct-taped together) paying 20M for 50TB is on the high side, but not by much. (I would have given a range of 10M-20M for the whole thing depending on the exact trade-offs made.)

3 HBAs per host is overkill for most applications (but certainly not all). I've found that two is generally sufficient. Never rely on just one, even for a non-critical system. I'm often amazed at just how critical non-critical servers become when down for several hours in the middle of a busy day.

Don't discount the significant setup and debugging costs at the beginning. This will cost not only in hardware/software/consulting but in time lost for your own admins to spend working with the vendor, going to classes, learning new methods of adding storage, accidently messing up the systems, cleaning up those messes, etc.

Get the best monitoring/management software you can. EMC is famous for gouging people on software costs so you'll need to use your best judgement. (HINT: PowerPath == Veritas DMP at up to 20x the cost. SRDF == Veritas Volume Replicator at up to 20x the price. TimeFinder == Mirroring at up to an infinite multiple of the price. You get the idea-- just use your best judgement and be cautious.) Under extreme single-host disk loads the otherwise minor performance hit for host volume management can become a problem, making that 20x price worth it. Maybe.

If possible, press them for management software that makes adding/removing/changing filesystems a one-step operation, complete with error checking. It really sucks to put that new database on the same disks as another host's old database and software can be really good at checking for stupid human mistakes.

That sounds like the pricing for a whole project (2)

duffbeer703 (177751) | more than 11 years ago | (#4228872)

Figure you get two IBM Sharks with two expansion frames, maximum cache and 36GB disk eighpacks.

That's like $6MM for most customers.

Fibre channel directors and switches ... about $500k

Tape robot... $1 MM

Storage Mgmt software like TSM... $400,000

The extra $10MM is probaly for full-time consultants, a more expensive solution like EMC or a more fault-tolerant solution.

Re:That sounds like the pricing for a whole projec (2)

haplo21112 (184264) | more than 11 years ago | (#4229391)

You could buy IBM if you want to loose your data all over the floor. Why do people always reduce these conversations to price...data is priceless...would you send fine china in a paper bag across country and no insurance(IBM Shark, Hitachi), or in a double wall cardboard box with bubble wrap(EMC Symmetrix/Clariion)...if that data gets lost its gone history...hasta la bye bye...its not all about cost people, don't get burned buy the right tool for the right Job...

Re:That sounds like the pricing for a whole projec (1)

Hieronymous Cowherd (11195) | more than 11 years ago | (#4230485)

Sorry, but I've personally seen the carnage when a supposedly-multiply-redundant EMC system lost customer data. Add to that EMC's egregious business methods, and they quickly become the last choice, behind Hitachi and IBM. There's a reason that no one else will work with EMC.
Plus, you probably shouldn't compare the Clariion systems directly with Hitachi/IBM Shark...they're much more on the level of the LSI (StorageTek/IBM/etc) systems.

multiple sites - SAN etc (2)

martin (1336) | more than 11 years ago | (#4228902)

What they've prob got is a massive SAN (storage Area Network) running over 2 or more sites. If one site goes down you can run on the other and at 30 miles apart.

Also accessing this amount of data at reasonable high rates is expensive, think Storagetek silos, HDS SAN's etc etc. All this is highend very very fast stuff.

If you've got 50 TB of data running in an OLAP cube you've got to have massive IO capability to properly load and spin the cube around. Ie the cost ain't in the actual storage media, but the IO (esp if you've got a split system requiring multi-site system).

There should be plenty of examples of this sort of data storage now - telcos to web logs. Pricing, well depends on the deal you can get at the time...

You pay for support. (3, Interesting)

molo (94384) | more than 11 years ago | (#4228943)

When you get a Symmetrix frame from EMC, you also get a support contract. EMC will send multiple people to your installation for maintenance. EMC will remtoely monitor your Symm via modem. They will help you plan your storage needs (including what kind of backup and reliability you need). EMC will provide 24x7 support for everything you need. Then there's management software, etc.

Don't forget that the hardware isn't cheap: Frame, multiple redundant hot swappable power supplies (requires specialty power connection), dozens of scsi drives, dozens of scsi controllers, 10-20 fibre channel connections, an interconnection network between FC and SCSI controllers that includes fiber and copper ethernet, hubs, etc., and a management x86 laptop integrated into the frame.

$20 mil for this is a fair price in my opinion. Anyone who rolls their own is just insane. There are hundreds of engineers behind each of these boxes, and it shows.

No, I don't work for EMC.

I know how. (4, Funny)

one9nine (526521) | more than 11 years ago | (#4228949)


Floppies. Lots and lots of floppies. They are so cheap right now! And the come in pretty colors too.

How to actually use cheap computers... (2)

funky womble (518255) | more than 11 years ago | (#4229183)

...but not yet

IDE-RAID with 3ware 7500-12 controllers and 3U 14-bay cases (available from rackmountpro, and probably others) could be one possibility, but I don't think you would get a 'flat' storage-space from it, probably have to be segmented instead. As others have pointed out NFS/Samba aren't really manageable ways to handle a filesystem spread amongst multiple machines. People who do this, like archive.org and google, have custom software to access the data stored on their machines. But it doesn't have to be that way forever...

I think iSCSI could give very interesting possibilities for open-source SANs using this type of hardware...maybe front-end servers which map requests as necessary to back-end servers holding the storage, you could have a rather nice fully-resilient highly-scalable system that way, which would just appear as another drive to a client machine, no NFS/SMB etc...

Cost (1)

andrew_lewis (534971) | more than 11 years ago | (#4229202)

150GB * 334 drives = 50TB = ~$100,000
add $19,900,000 for consulting fees and you've got your 20 million. Speaking as a consultant, that seems reasonable to me.

Re:Cost (0)

Anonymous Coward | more than 11 years ago | (#4231050)

Heh.. Consultants. You would use IDE drives, wouldn't you.

If that was supposed to be funny.. It's well known I have no humor.

CDs - Obvous choice (3, Funny)

getagrip (86081) | more than 11 years ago | (#4229244)

Ok, Lets see. 50 Terabytes divided by 600 megs per CD means you will need 83334 CDs (rounded up.) At about 20 cents each (retail) that should only set you back about $17k. Add in $100 for some of those heavy duty shelving units from Home Depot and a wintel box to read and write them, and you are looking at well under 20k for total hardware cost. At this point, just go hire someone away from their McJob for a reasonable amount to swap the CDs and you are in business.

Re:CDs - Obvous choice (1)

Tower (37395) | more than 11 years ago | (#4230606)

hah - you don't need nearly that many... use gzip and get that number down to half or less... that's only 41667 CDs - even less if you use bzip2. That should cut a lot of your costs right there ;-)

Try EMC on eBay (2)

j-turkey (187775) | more than 11 years ago | (#4229379)

Searching eBay for EMC provided some interesting results (these are mostly "buy it now" prices):

EMC Symmetrix 3930 w/ 12 TeraBytes [ebay.com] = $57K
(With the proper drive configuration, this unit should [emc.com] be able to deliver up to 70TB in a single system).
This one comes with 12TB of storage (256x50GB HD's). If you throw out all 256 of those 50GB HD's (or just give them to me as a consulting fee for saving your company over $19.5 million) and buy 256X181 GB HD's, you're just short of you 50 TB mark (~46,336 GB).

On Pricewatch [pricewatch.com] those drives come out at $999 ea x 256=$255,744.00 add the initial $57K and you've got a machine that meets your specification significantly less than $20mil

Here are some other EMC machines for sale on eBay:
EMC Symmetrix 3830-36 With 3 TB No Reserve! [ebay.com] = $59K

EMC Symmetrix 3700 6TB w/Install & 1YR Mnt! [ebay.com] = $48K

EMC Symmetrix 5700 3TB Storage System [ebay.com] = $9K

This is what I found by doing minimal research. I'm not 100% sure that the Symmetrix 3930 can handle that configuration (its not my money) so before you go down this road -- do your research (better than I did).

--Turkey

Re:Try EMC on eBay (3, Insightful)

haplo21112 (184264) | more than 11 years ago | (#4229570)

Yep your a "--Turkey" all right got about the same size brain if you think thats a viable solution...

The EMC boxes(or anyone else for that matter) have a significant amount of configuration associated with connecting the drives. You cant just open the Box up and start sticking in drives and expect it to work. For that matter, in many cases if the drives are not the ones rated for use in the box you can destroy the backplane of the machine. The power supplies, the drives themselves, etc...Power and heat are huge issues in these boxes...think of the heat the average hard drive throws off now put 100+ in a box the size of the average home refrigerator...

Then there are configuration issues, you need the software and the technical know how to write the configuration files these machine use to tell the multiple drives to act as one or many logical drives.
Then how do you connect the system(s) that will use the box up. These are all delicate issues.

If you buy a box off Ebay you will absolutely need someone working for you who knows the product inside and out(or at least on a retainer contract with 24x7 support clauses)...and you should immediately make a phone call to the proper support phone number to get the thing on a support contract...Trained EMC professionals don't come cheap, but they are worth every penny, I would assume that other companies its the same story, but I only use EMC so I don't know...

Buy EMC its really the only long term option, I have seen one of these boxes get knocked over on its side(no small task) while it was running, and just keep going with out a hitch...thats a well engineered product....

Re:Try EMC on eBay (2)

j-turkey (187775) | more than 11 years ago | (#4230239)

Yep your a "--Turkey" all right got about the same size brain if you think thats a viable solution...

Turkeybrain, maybe, but you're coming off as an arrogant prick! But I'll extend you the courtesy that you didn't bother to extend to me by giving you the benefit of the doubt.

Did you bother reading my entire post? Did you read the part where I stated that this was undisputable fact and that anyone with a problem with it is just plain wrong? That's right -- I didn't say that. In fact, I said that I did minimal research.

As far as finding a tech who knows EMC -- it shouldn't be more than $100-125K/yr full time-- and in this economy, they're out there for the hiring. Add in an extra $200 premium ea for those being EMC friendly drives, and you're up $51K or so. Am I getting any warmer? Still a hair under $20,000,000, right?

BTW -- why is it that because you're on slashdot you think you can get away with talking to me like that? If you walked up to me on the street and pulled that, I'd pop you right in the nose. Thanks for your extensive EMC knowledge, Junior.

--Turkey

Re:Try EMC on eBay (2)

haplo21112 (184264) | more than 11 years ago | (#4230724)

Ok, then do the real research and find out why you are wrong...!

Re:Try EMC on eBay (2)

j-turkey (187775) | more than 11 years ago | (#4230933)

Look halpo,

Its fine to point out where someone is wrong -- and I'm nore than willing to hear it and discuss it. However, its totally inappropriate to say rude shit and get in my face about it -- I guess that you still don't understand that. Too bad for you.

--Turkey

Re:Try EMC on eBay (1)

Emnar (116467) | more than 11 years ago | (#4229928)

Why does nobody ever mention Network Appliance? They're very competitive with EMC feature- and reliability-wise, and definitely kill them on price.

Re:Try EMC on eBay (2)

j-turkey (187775) | more than 11 years ago | (#4230280)

Why does nobody ever mention Network Appliance? They're very competitive with EMC feature- and reliability-wise, and definitely kill them on price.

Now that you mention it -- I don't know. I've used EMC, NetApp, and BlueArc -- but EMC was the first thing that came to mind...and I remember seeing EMC machines for sale on eBay before. I didn't have time to do extensive shopping/research, so I went with the lowest common denominator.

--Turkey

Re:Try EMC on eBay (1)

Hieronymous Cowherd (11195) | more than 11 years ago | (#4230533)

Add significant license fees and support to that EMC order. EMC will require probably $20K just to turn on that Symmetrix, if they'll even deal with you. I've dealt with people who bought used EMC before, and got severely burned. Plus, as the other guy who you threatened with assault said, EMC will require EMC-supplied drives before they turn on the system. Enterprise-class gear doesn't run without contracts, generally.

Re:Try EMC on eBay (2)

j-turkey (187775) | more than 11 years ago | (#4231204)

Two things:

1. Check out the auction, it says that the system is suitable for re-certificaton:

The system is being offered with a 30 day purchase money back or replacement as available warranty against failure from proper use, and that it will be suitable for EMC re-certification
I've seen other EMC systems on eBay advertised with a full EMC warranty.

2. If I hire an engineer full-time (hell -- for that money, a group of engineers, 4 at $100K annually over a 10-year project -- still cheaper than the $20M), do I need an EMC support contract? Do they need to come and fire the box up for my engineering/administrative group?

Thanks for the info though...

--Turkey

bullshit. (0)

Anonymous Coward | more than 11 years ago | (#4229433)

i've built a (fully redundant with backup) 100TB array for $2 mil.
if youre interested in buying one, mail me. i can drop one together with no real problems.

-- zurk42 at hushmail d.o.t com

Re:bullshit. (0)

Anonymous Coward | more than 11 years ago | (#4231016)

Oh yea.. I'd give $2 million to an anonymous coward that I met ONLINE using vulgarity on a public message board.

You sell bridges to? I'm looking.

Past Experience... (2, Insightful)

Mark Imbriaco (133740) | more than 11 years ago | (#4229515)

I'm not sure what the prices are running these days, but back in 1999 I put together a 6TB system running RAID 5 on an all fibre-channel system using (at the time FC hubs -- switch fabric was too immature) StorageTek (aka Clariion) arrays for right around $2.5M.

Keep in mind, that's just for the disks, array controllers/cabinets, hubs, and Sun FC cards. No servers are included in that price.

There are so many variables that you didn't go into that it's hard to give you an educated answer to your question, but it seems feasible to get to around 50TB today for that kind of money taking into account the increased storage density that we've gotten in the last couple of years.

My advice (0)

Anonymous Coward | more than 11 years ago | (#4229600)

Wait until 2023, puchase 50 TB memory stick for $12 at Wallmart.

Alternative - the Sun Solution (1)

drprotagonist (557411) | more than 11 years ago | (#4229675)

$20MM sounds very high. The Sun StorEdge 9980 System costs $2.3MM for 20T, upgradeable to 70T. So say between $5MM and $6MM for 50T. The system is a fully racked SAN - just plug it in and go. http://store.sun.com/catalog/doc/BrowsePage.jhtml? cid=82215&parentId=75082

tapes? (2)

austad (22163) | more than 11 years ago | (#4229846)

Does this stuff have to be online for immediate access, or would ti tbe acceptable for it to be online in a very slow filesystem and be available within 1 minute?

I built a system using spectralogic Bullfrog AIT changers, and LSCI's SamFS system. It sticks metadata for the files on your actual disk, but when you request one of the files, it goes to tape and gets it for you. For 50TB (uncompressed), you would be able to get by for under $500,000. However, that's without mirroring tapes. Trust me, you want to mirror your tapes. I've had them fail before. Figure double the price if you are going to mirror. Also, I'm not sure if the new AIT drives are out yet that will hold 100G uncompressed. If so, this will bring the cost down.

I know, the system sounds sketchy, but it works quite well. Seek time is definitely slow, but once it finds it on tape, the actual transfer is quite fast.

Depends (2)

photon317 (208409) | more than 11 years ago | (#4229958)


The big factors in storage cost, breifly:

r) Reliability
s) Speed
c) Cost

In rough terms, c=s*r, meaning the cost will rise dramatically for high speed reliable storage versus low speed crap storage.

In addition, how the storage is designed (and how much more it can cost) depends a lot on data access patterns as well (read-mostly vs write-mostly, oltp vs dss vs datawarehouse vs ...., etc).

Maxtor has 0.3TB IDE for $1/GB. If you built a huge array of IDE controllers for these, your disk cost for 50TB would be around $50k. If some vendor actually built a beast with the requisute number of IDE busses and whatnot, the chassis might run you another $100k. All in all, real cheap storage. But it would suck on performance and reliability, put out too much heat and noise probably, etc, etc.

Highly available disk arrays with extreme disk platter performance and large amounts of caching can easily run $20 million for 50TB, if not more. There are middle of the road solutions though, it doesn't have to be that expensive unless you're going all out for huge concurrency and speed in an OLTP environment that requires 99.999999% uptime.

Re:Depends (1)

adb (31105) | more than 11 years ago | (#4230463)

It's worth mentioning what "reliability" means: it's more like 1/downtime than like uptime, which is why you'll pay maybe ten times more for another minute or two of uptime.

It's very likely that the company has a profoundly silly idea of how much uptime they really need.

Re:Depends (3, Funny)

wfrp01 (82831) | more than 11 years ago | (#4230865)

If you're using Depends, you should always opt for the high speed reliable storage over the low speed crap storage.

Well.... (0)

Anonymous Coward | more than 11 years ago | (#4230114)

We purchased a 10TB IBM NAS server + backup system for it. The total cost was in the region of £1.2m, including some hefty discounts. Add the tapes for a 6 month backup cycle at ~£150k and you're looking at the best part of £1.5m. This doesn't include any installation costs etc.

Management software can be pricy as well; Sun's SRM stuff comes in at around a quarter million per 10TB, plus you'll need a server (NT at the moment... yeugh!) for that etc.

All that added together and scaled up probably hits a good chunk of $10m, but $20m does seem a little much, but then, it may do a lot more than our system above.

HSM (Hierarchical Storage Management) (2)

Leknor (224175) | more than 11 years ago | (#4230466)

Is HSM a solution? We rarely access very old data but we still like it to be easily available. With HSM we can move data to tape or some other cheaper storage while it still appears to be on the local filesystem. Applications don't know the difference other than they have to wait about 45 seconds as the data is fetched to local storage. In the end it depends on how you access your data. http://searchstorage.techtarget.com/sDefinition/0, ,sid5_gci214001,00.html [techtarget.com]

available storage != amount of disk (1)

cballowe (318307) | more than 11 years ago | (#4230565)

If you need 50TB of online storage for a highly available application, count on buying triple that in disk. You'll likely be using SAN RAID controllers with snapshot capabilities to minimize or eliminate downtime necessary for backups. This is on top of mirroring and striping (RAID 1+0) which doubles your disk needs. Or, using less expensive controllers you could fake the snapshot mode (put the disk in 3 disk mirror sets, break one of the 3 out and reassemble a backup stripeset, mount that and back it up.

Your options in such a large environment are extensive - and managing it can get fun.

A good vendor (0)

SprayThought (604907) | more than 11 years ago | (#4230569)

While EMC is the established leader in this sector, there are alternatives that could bring that 20MIL price tag down. We use NetApp filers here and in the two years I've worked with them have experienced no trouble. Just thought I'd put my two cents in...

And that's just the storage! (1)

Harik (4023) | more than 11 years ago | (#4230600)

20 mil is a starting point, actually. You still need a climate controlled datacenter, quad-redundant power sources, onsite generator (that can power the raid + the cooling!). And of course, ask them if it's possible to power it down.

The hitachi solution is, as far as I know, reliable to the first power failure, period. Then it's an empty disk again. I believe they do guarantee it in anything other then a powerfail situation, however. Hence the quad-redundant power + onsite generator requirement. If you really have that kind of budget, call a sales rep and ask them about physically moving it 2 years down the road. Last time I asked, they said "Buy another one, lease an OC3 from bell and mirror. Don't Turn It Off."

I've only got a budget for 1TB systems. At that scale, it's amazing how cheap it is. 1 HBA on each set of 15 x 72GB 15k U160s, (raid5) using network sync between the two seperate boxes. Came in to about 25 grand. Nice in that you can 'detach' one entire system, back it up, then resync it. This is for a large-dataset low-transaction volume setup, though. Secondly, backup is hideously expensive. Tapes = useless. Get something that lets you snapshot + delta the whole array. Drives are a thousand times cheaper then tapes to manage. (TCO, equipment AND maintenence) Plus without 100 tapes in paralell, you won't be able to backup that kind of data in a reasonable timeframe.

--Dan

about 7 million - maybe (0)

Anonymous Coward | more than 11 years ago | (#4230779)

ok, I did a quick google search, and came up with this: http://www.serverworldmagazine.com/monthly/2002/08 /sgi.shtml
it's the SGI File Server. According to the article, it scales to over 50TB and costs around $67000 for a 458GB. Based on those numbers, you would need 110 of them to equal 50TB and total cost would be $7.37mln. Obviously, this is without any consultation with SGI, and they may have a better price. So, short answer is: Call SGI and have them quote you a price. Oh, and use GOOGLE before you submit an "ask slashdot" question. Nobody seems to do that.

Real World storage costs.. (1)

chunkwhite86 (593696) | more than 11 years ago | (#4231345)

FWIW, I used to be a Storage Area Network (SAN) designer for Compaq. The largest cost of creating multi-terabyte storage arrays is not the disks - it's the infrastructure needed to support the disks. i.e. the backplanes, the external raid controllers, not to mention that everything needs to be dual redundant. Further, all modern SAN's are attached to the hosts using Fibre Channel. Fibre Channel switches can run anywhere from $15k each, to over $200k each, depending on the size and featureset of the switch. Also, each host attached to the SAN will require one or more fibre channel adapters which run several thousand dollars a piece.

Based on current internet list prices, a given SAN will cost roughly $250k per Terabyte. Thats just over $12M for a 50TB SAN. Once you add additional Warranty, onsite service, and installation/configuration services (yes, you must pay for the vendor to come on site and set these things up - they are not simple, nor intuitive), your up closer to the $20M figure in your initial question.
Load More Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Sign up for Slashdot Newsletters
Create a Slashdot Account

Loading...