Costs Associated with the Storage of Terabytes? 161
NetworkAttached asks: "I know of a company that has large online storage requirements - on the order of 50TB - for a new data-warehousing oriented application they are developing. I was astonished to hear that the pricing for this storage (disk, frames, management software, etc...) was nearly $20 million dollars. I've tried to research the actual costs myself, but that information seems all but impossible to find online. For those of you out there with real world experience in this area, is $20 million really accurate? What are the set of viable alternatives out there for storage requirements of this size?"
320G Maxtor Drives? (Score:1)
Not sure if this is what you're looking for, but I couldn't help noticing the timing after the last article.
Re:320G Maxtor Drives? (Score:2)
Add to that switches, routers, T1s, building, cooling, etc. Now if you need that to be robust you would build RAID units and that could double the cost.
Re:320G Maxtor Drives? (Score:2)
320 GB = 156 / 4 IDE drives per box = 40 Boxes
I recently picked up a Promise Supertrak 6000. I droped an email to Promise and found out that one machine should be able to run two supertrack (hardware raid) cards. With 6 IDE buses (12 drives) and two cards per machine. You could setup each machine with linear raid and 24 320 meg hard drives. If you did that you would only need seven machines!
Realistically though you would need redundancy in this kind of system which means some lost storage. With raid 5 it would be more like 14 machines for 40+ terrabytes.
Now figure about 1000 bucks for a MB case processor gig of ram etc, and 300 for each card and $350 for each drive * 12. That comes to $5800 a piece. $3500*14=$81,200. If you want hot swap it would be +130 for the card Hot swap package, plus 9 more drives at $80 a pop is $720 per machine, multiply that by 14 and hot swap capibility will cost you an aditional $10080 dollars. That is a hot swap, raid 5 total of 91,280 dollars.
With a few top notch megabit switches with channel bonding, you're looking at $100,000. Can you hire me for the additional $19,900,000? I would like to continue to work in NY.
Re:320G Maxtor Drives? (Score:2)
I gladly take the $170,000 pay cut for my foolish exuberance to 19,730,000. :)
I love Linux. :)
Re:320G Maxtor Drives? (Score:2)
Twelve drives per card, multiple cards allowed.(I've heard of 5 controllers in a machine being used, however, PCI bus speed is an issue long before even the second controller goes in)
The Promise 6000 is a 6 drive controller. You can't do that doubled up of master and slave on each controller that you planned on doing.
Re:320G Maxtor Drives? (Score:2)
The documentation clearly states as many as 12 drives. I could have sworn there were cables with two connectors with the card. I'll check tonight.
Same basic idea. What experiance do you have with the Escalade, I heard it was undiscontinued due to demand.
Re:320G Maxtor Drives? (Score:2)
If one of your drives dies on the chain, sometimes it'll take the other drive on the chain offline with it. As well, you can't hotswap in a drive if there's another one on the chain. AND, it makes cabling more difficult when you have to put the two drives near enough to reach for the cable.
And the escalade is generally accepted as one of the best cards out there. They were discontinued, yes. But now they're back, they have 4,8, and 12 port models. As well, they have Serial ATA cards.
Re:320G Maxtor Drives? (Score:2)
Re:320G Maxtor Drives? (Score:3, Interesting)
In other words, someone dealing with 50TB and who wants backups of that data will be spending many, many times the amount it would cost to just purchase enough hard drives to get the bragging rights of 50TB. And a backup located in the same room/floor/rackspace/whatever as the source data will be pointless in the event of fire, floods, nuclear fallout, etc. So, they would also need a way to transfer all that data to offsite backups in a timely manner (waiting five weeks for a full backup to transfer over a 100Mb/s pipe would probably not be acceptable).
Aside from backups, how would the drives be accessible? Even as JBOD, you're talking 40 IDE/ATA controllers (assuming 320GB drives and 4 ports per controller), or 20 SCSI channels (assuming 160GB per drive and 15 non-host devices per channel) to support that many disks. You could also use Fibre Channel and get away with only a couple arbitrated loops. Physically, you're talking about hundreds of disks that need to be mounted somewhere, so you would also need dozens of chassis to hold the drives.
But, hundreds of disks in a JBOD configuration means you'll have hundreds of partitions, each separate from the others. Hell, if the clients are Windows machines, they won't even be able to access more than a couple dozen at a time. And even for operating systems with better partition/mount-point addressing, it would be unmanageable.
So, now you get in to needing a RAID solution that can tie hundreds of disks together. If you're talking about hooking these up to standard servers through PCI RAID cards, you'll need several of those machines to be able to host all the controllers necessary (especially if all the disks are not 160GB or larger each).
The only realistic solution for this much storage, at least until we have 5TB hard drives, is a SAN-like setup. Specialized hardware designed to house hundreds of disks in stand-alone cabinets and provide advanced RAID and partitioning features. SANs don't come cheap.
Add to the SAN the various service plans, installation, freight, configuration, management and the occasional drive swapping as individual disks fail and you've already multiplied that $50K several times, as a bare minimum (and you still haven't priced out the backup solution).
There's a lot more to it than just having a pile of hard drives on the floor. I wouldn't even be surprised if the drives are the cheapest component.
Metacomment (Score:4, Insightful)
Meta-Metacomment (Score:2)
Re:Metacomment (Score:2)
Why is it that 90% of "Ask Slashdot" pieces seem to boil down to "I have no real world experience, and I'm just wondering how I can solve problem X for Y dollars when twenty different vendors all sell solutions for 100 * Y dollars?"?
It makes readers feel superior, and keeps them coming back.
Look at the quantities (Score:2)
I think it's pretty reasonable to feel that you could put something like this together for under $100K.
Re:Look at the quantities (Score:4, Informative)
Then you want to back this up? Break out your checkbook again for a Compaq minilibary if your lucky, that is only 10 tapes x 80gig a tape...800gig..and that is if your really doing well. So put that on top of it all 10x10X80 gives you 8 TB of backup at around 30k each for the minilibs, the price just keeps on jumpin!
No way, no how, not today or tomorrow. 100k will get you a floor full of 120gig maxtor drives and that is about it.
I'm not saying that this is the standard... (Score:4, Insightful)
The problem is that I find that corporate spending on IT purchases has gotten ridiculous. Let's buy a TEMPEST array! Let's buy something with a Sun nametag because the name sounds good! Let's buy a $2k piece of software for each workstation even though there's a free alternative!
I'm not saying that anyone *provides* something in the price range I was talking about. No one is crazy enough to do so, if companies are willing to pay much, much more. I'm saying that, if you're asking whether it's possible to *build* something like this for the price range I mentioned, off the cuff it doesn't sound so unreasonable.
Yes, a seasoned IT person who works with high-end systems like this will laugh. Why? Because they're used to paying huge amounts of money. Because it's an accepted part of the culture to throw down this much cash. What I want to know is -- how often do people question these basics? How often has someone said "Wait a minute...this is wrong."
Are you telling me that if you were in a third world country without the exorbant amount of funding that we USians enjoy, and someone asked you to put together a 50TB storage system for under $1M, you'd simply say "It can't be done"? No consideration, nothing?
I mean, when I look at the fact that the *case* on, say, a Sun high end system costs more than a whole cluster of workstations, I start to wonder just how much excess is going on here.
Say we take the bare-metal, dirt cheap approach. Grab a bunch of Linux boxes. Throw RAID on them configured so that 1/3 of your data is overhead for reliability, and a 100Mbps Ethernet card in each. The figure used earlier was $1 per gig. Put 6 200 GB drives in each. Throw down $250 for the non-drive cost of each system. You have 800GB of data on each system, 400GB of overhead. That's 63 systems. $16K for the systems, $75K for the drives, and we come in to $91K. I left out switches -- you'd need a couple, but certainly not $9K worth.
You'd need some software work done -- an efficient, hierarchical distributed filesystem. I didn't factor this in, which you could consider not fair, but there may be something like this already, and if not, it's a one time cost for the whole world.
Maybe another few systems up near the head of the array to do caching and speed things up, and you still aren't even up to $150K, and you have failover (at least for each one-drive-in-three) group.
I haven't looked at this -- it might be smarter, since you'd want to do this hierarchically, to have caches existing within the hierarchy, or maybe Gbit Ethernet at the top level of the hierarchy. And obviously, this may not meet your needs. But as for whether it's possible to build something like this for that much money? Sure, I'd say so.
Finally, existing SANS or any sort of network-attached storage are overpriced, no two ways about it. Very, very healthy profit margins there. Sooner or later, someone is going to start underselling the big IT "corporate solution providers" and is going to kill them unless they trim margins by quite a bit.
Re:I'm not saying that this is the standard... (Score:2, Insightful)
Also, go with KNOWN drives. The new 300+ GB drives sound nice on paper. But remember the case of IBM; great solid drives and then on a whole new line, it became a nightmare. Like stepping back to MS.
Re:I'm not saying that this is the standard... (Score:2)
You can bring the cost down.
You're joking, right? Moderators: you too, right? (Score:2)
I find it's gotten too spendthrift, myself. We just got a Blade 2000 (anniversary edition) with only a single system disk. Why the OS of a 30,000 dollar machine is not mirrored is beyond comprehension, to me.
Let's buy a TEMPEST array! Let's buy something with a Sun nametag because the name sounds good!
No, let's buy those things because, if something in them breaks, the production payroll machine doesn't go offline. Or let's buy those things because, if something does break, I can have a tech on-site in 4 hours with a hot-swappable replacement part. Let's buy them because my customers (my users) won't notice the downtime while I pull a CPU module, PCI card, or disk and replace it without powering the server down.
Let's buy a $2k piece of software for each workstation even though there's a free alternative!
No, let's buy a $90,000 piece of software because it allows us to precision-machine aerospace parts more efficiently than hand-drawing the same models in two dimensions on a drafting board, or because we can run simulation testing on our airframe to see how much stress it can take before it destroys itself. Let's spend our money smartly to produce more revenue and profit.
Say we take the bare-metal, dirt cheap approach. Grab a bunch of Linux boxes.
I've seen horror novels with better beginnings...
Throw RAID on them
Apparently "throwing RAID" on something is good enough for enterprise-level.
and a 100Mbps Ethernet card in each.
This will work great on a network where every client is connected at 100/full, and the normal servers have fiber or gigabit uplinks. You may have gotten away with this in 1995, but it's 2002.
The figure used earlier was $1 per gig. Put 6 200 GB drives in each. Throw down $250 for the non-drive cost of each system.
$250 for the rest of the system? Motherboard, RAM, CPU, power supply (dual? Hah!), and case? Our AIX NFS servers have RAIDed MEMORY, not to mention at least triple the amount they'll ever need of that, CPUs, local disk, power supplies, and PCI expansion chassis.
You have 800GB of data on each system, 400GB of overhead. That's 63 systems. $16K for the systems, $75K for the drives, and we come in to $91K. I left out switches -- you'd need a couple, but certainly not $9K worth.
Yeah, you could just go down to CompUSA and pick up a few Netgear 8-ports. Nobody will ever need a VLAN. (The modules in our 6509s cost more than $9k.)
You'd need some software work done -- an efficient, hierarchical distributed filesystem. I didn't factor this in, which you could consider not fair, but there may be something like this already, and if not, it's a one time cost for the whole world.
Yeah, you could hack something together. Let us know how that goes.
Meanwhile, I'll be enjoying another day of outage-free administration, at least on the machines we built the right way.
- A.P.
Re:You're joking, right? Moderators: you too, righ (Score:2)
Bullshit. Expensive Sun servers crash all the time due to memory and CPU failures. The more CPUs and the more memory, the more chances for failure. These boxes do not have redundant CPUs and memory until you get into absolutely isnane price levels. If you care about reliability, it is better to have truly independent machines, and let the software handle the redundancy. Sure, mirror the storage, because you know hard drives fail, and have redundant network interfaces to protect against a witch failure. But don't forget that a "high availability" E6500 is 22 times more likely to crash than a "workstation class" ultra 1.
Re:You're joking, right? Moderators: you too, righ (Score:2)
Really? Hadn't noticed. If by "all the time", you mean our E420 with an ecache parity problem (yes, this is a known issue with that series of CPU) which used to go down once a week until I took the faulty CPU offline from the command line, then, yeah, it crashed all the time.
The more CPUs and the more memory, the more chances for failure.
You obviously haven't heard about things like ChipKill and ELIZA fault-tolerance initiatives.
You can run a machine with a bad CPU for months without worrying about it, and bad memory modules can now be cycled out of use without even causing memory access violations.
don't forget that a "high availability" E6500 is 22 times more likely to crash than a "workstation class" ultra 1.
What are you smoking, and where can I get some?
- A.P.
Re:You're joking, right? Moderators: you too, righ (Score:3, Informative)
You know that I am talking about commonly available multi-CPU systems, and not exotic (and insanely expensive) systems with redundant CPUs and memory.
What are you smoking, and where can I get some?
Do you seriously believe that an E6500 or similar system will not crash if there is a faulty CPU? Despite your impressively low slashdot UID, if you believe this, you have virtually no experience with such systems.
Okay, let's see (Score:2)
This is part of what I'm complaining about. Hardware vendors have sold users on expensive, heavily hot-swappable systems where they make huge profit margins. They work very hard to steer clients away from consumer-level stuff, where their profit margins are nearly nonexistent. If you're willing to make a system the fundamental unit of failure here, you can easily buy a $3K system with a second failover $3K system. Why pay five times as much so that you can swap out a CPU instead of just swapping out a whole system?
The whole measure-system-capabilities-by-dollar-value thing is what I'm objecting to -- your first response was "This is a $30K system".
No, let's buy those things because, if something in them breaks, the production payrool machine doesn't go offline.
I severely doubt that more than 10% of the people with TEMPEST systems actually need them. I was looking at one cluster of very overpriced and very underused set of TEMPEST workstations at a company a while ago. They would have been better off with some stock x86 machines.
hot-swappable replacement part
See above. It's much cheaper at this point to buy two consumer-level systems and let failover take over for one system than to buy a single high-end system.
No, let's buy a $90,000 piece of software because it allows us to precisions-machine aerospace parts more efficiently...
The price I quoted was $2k. You're listing $90K, which is well into the vertical application market. There -- yes, you don't have much of an option. You need an airfoil simulator that does foo, baz, and bar, and there's only one vendor with it -- you pay for it.
I'm talking about buying horizontal market things like commercial variants of CVS, compilers, or other systems where there are very good free alternatives, yet companies persist on evaluating things based on price.
Apparently "throwing RAID" on something is good enough for enterprise-level
Who's to say that this approach is fundamentally flawed? Sun? IBM? Of course they're going to scoff -- they've got machines and service contracts to sell. A high-level IT person? They've been suffused in the "spend lots more to get decent quality" propoganda from said companies for so long that it'd be hard to get an objective viewpoint.
and a 100Mbps Ethernet card in each
This will work great on a network where every client is connected at 100/full, and the normal servers have fiber or gigabit uplinks
Notice that I mentioned having the front-end systems, the ones doing caching, have faster interfaces.
$250 for the rest of the system
For a file server, very little is needed in terms of CPU juice, or RAM (before you start screaming about caching, as mentioned above, I want a systemwide cache sitting at the front of this). Make the cache able to cache anything on the SAN, so that you're efficiently using your resources. Why would I need PCI expansions chassis or RAIDed memory? I've already listed everything every box needs, and I'm willing to bet that the number of RAM chips you've had suddenly and unexpectedly fail (for God's sake, this is solid state storage) is right up there with numbers of servers hit by lightning.
Nobody will ever need a VLAN. The modules in our 6509s cost more than $9k.
Why would I want a VLAN within my storage system? To the outside world, this is a single entity. For that matter, Cisco systems definitely fall into my "overpriced because IT will buy it because it sounds sexy" category unless you really need the few systems that they do that *no one else* can duplicate in functionality. You can run VLANs off a Linux box.
Meanwhile, I'll be enjoying another day of outage-free administration, at least on the machines we built the right way
As I said earlier, I never claimed that this is available out of box right now -- just that you can build something like this. And neither did I say that your systems are outage-prone. I do think that name brand systems are oversold on vague reliability promises. Is my RAM going to suddenly fail? No.
I've found that the primary reason that purchases will spend their employer's money is the ASHF (Avoid Shit if it Hits Fans) syndrom. IT personnel are willing to make suboptimal purchasing decisions so that they have someone *else* to point to if something goes wrong. "Sun's supposed to fix that, not us." "This is a best-of-class component that failed."
Now to some extent, the corporate culture fosters this, but I just want to point out that every time I hear people bragging about the cost of the systems they administer, I wince and think about this.
My guess is that this is going to die over the next five years or so. At the moment, there's a glut of secondhand networking and serving systems available from dying dot-coms. Once that's over, though, you have companies in India and Eastern Asia that can't afford to waste the kind of money that US companies do on systems. So you get manufacturers (probably non-US) springing up to create low-cost systems that fill their needs, without the exorbant profit margins. Eventually, as reputations become established, they'll start selling to US corporations trying to bring down costs and compete with those foreign competitors, and overpriced IT purchases will be a thing of the past.
Linux is part of the advance front of this -- it's cheap to set up, runs on cheap commodity hardware (who's manufacturers make very little profit per unit), and you can build fancy things on top of it. As a matter of fact, that's most of the reason Linux has been propelled into the business market at all -- not because a bunch of geeks think it's sexy to use (though it sure would be neat if that *were* the reason), but because the profit margins are in a more sane range.
Almost all products follow a process of starting out very expensive, becoming more common and understood, commoditization, and eventual drop of profits to near zero. And once a product has reached the end of this process, bringing the price back up is very, very hard.
Re:Okay, let's see (Score:2)
The system is a workstation anyway. And an extra $1200 would've gotten us the redundancy I like, in the form of a second 72GB Fiber Channel drive.
I severely doubt that more than 10% of the people with TEMPEST systems actually need them. I was looking at one cluster of very overpriced and very underused set of TEMPEST workstations at a company a while ago. They would have been better off with some stock x86 machines.
What were they doing with them? If they would have lost more than the value of the TEMPEST system for every day of downtime (spread out your outages over a few years -- the average life of the system), versus the extended downtime a lesser system would have given them, it was worth the money spent.
It's much cheaper at this point to buy two consumer-level systems and let failover take over for one system than to buy a single high-end system.
What do you do when your cheap consumer systems both lose their single power supplies because of a voltage spike that makes it past the UPS (would you even buy one of those?)
I'm talking about buying horizontal market things like commercial variants of CVS, compilers, or other systems where there are very good free alternatives, yet companies persist on evaluating things based on price.
Side note: You would use g++ as a compiler for your product? The code it produces is about as efficient as a fully-loaded Excursion full of fat chicks.
Companies evaluate products based upon the availability of the following:
Support.
Features
Support
Accountability/Liability
Support
Compatibility - not only with existing and future hardware and software, but with existing employee skillsets
Companies also like a well-supported product, not a "well, supported..." product like most Open Source Software is. You can tell me that Open Source is better-supported than its commercial counterpart, but I say bullshit. For a show-stopper security bug, that argument may hold water, but it leaks like a sieve when you get into specialized cases.
Case in point: we broke IBM AIX 5.1 a few months ago. There's a major bug in 64-bit mode involving extended file attributes (ACLs) over NFS using JFS2. That this would affect anyone at all came as a surprise to IBM (as it would have to anyone, probably, who hadn't tested their software absolutely and completely), but we kept them on our problem 24/7 for a week until a patch was issued. I'm willing to bet the problem would've gone unresolved for weeks in Linux as a more pressing issue was ironed out, since 2.4 has suddenly become a development branch. This is what money buys you in terms of support, and it's not something the Open Source community has the capability of providing.
"This will work great on a network where every client is connected at 100/full, and the normal servers have fiber or gigabit uplinks."
Notice that I mentioned having the front-end systems, the ones doing caching, have faster interfaces.
So the front-end systems will have gigabit interfaces, but the smaller machines will still be limited by the 100 megabit bottleneck? Most hard drives, even the cheap piece-of-garbage IDE cans you want us to use, can push 30 megs per second today. They'll meet the ethernet card full-on and be very disappointed at what they see.
Cisco systems definitely fall into my "overpriced because IT will buy it because it sounds sexy" category unless you really need the few systems that they do that *no one else* can duplicate in functionality. You can run VLANs off a Linux box.
Yeah, and I can run them off an iPaq too, probably. But why the hell would I want to? I honestly don't understand why I would want to give up hours spent aggrivated and frustrated over Linux's lack of a feature here and there, simply because it can do one or two of the things a real piece of networking gear can do.
I've found that the primary reason that purchases will spend their employer's money is the ASHF (Avoid Shit if it Hits Fans) syndrom. IT personnel are willing to make suboptimal purchasing decisions so that they have someone *else* to point to if something goes wrong. "Sun's supposed to fix that, not us." "This is a best-of-class component that failed."
I've found they spend more in the short term to save more in the long term. If you think doing something the right way is expensive, try doing it the wrong way.
- A.P.
Hmm.... (Score:2)
Basic numerical analysis.
voltage spike that makes it past the UPS
I've yet to see a spike damage even a system on a cheap surge protector, much less a nice UPS. I *have* seen surges over POTS lines damage equipment, though. Come to think of it, my neighbor's house was hit by lightning at one point, knocking out her modem...yet leaving her computer intact.
Side note: You would use g++ as a compiler for your product? The code it produces is about as efficient as a fully-loaded Excursion full of fat chicks
Not anymore. Take a look at the code that a gcc-3.2 build puts out...It's light years beyond the 2.7 and earlier era, the time that built gcc such a bad rep. It's competitive with the better compilers out there now (at least in generated code...Sun's C++ compiler compiles more quickly). Oh, and the good code generation is on the x86 -- never tried comparing recent builds on SPARC or PPC.
Case in point: we broke IBM AIX 5.1 a few months ago
So you're asking me both to believe that this had to be fixed immediately (as in, whatever you were doing before you broke AIX 5.1 was no longer an option) and that Linux wouldn't have been fixed quickly (and while there probably are issues that have taken a while to fix, I tend to see patch times that beat competing OSes).
They'll meet the ethernet card full-on and be very disappointed at what they see
You're talking raw streaming of a huge sequential series of reads, which may or may not be an issue here -- but that's besides the point. You're leaving out the possibility that data could be interleaved across different machines to avoid exactly this issue. Do it in software, I say -- it's cheaper.
simply because it can do one or two of the things a real piece of networking gear can do
Okay, I'll bite. Short of sheer mass bandwidth that you absolutely require custom hardware for, like a backbone provider, what specific features are you complaining about the lack of?
I've found they spend more in the short term to save more in the long term. If you think doing something the right way is expensive, try doing it the wrong way.
I agree that doing something the wrong way can be more expensive -- I'm just not sure that saving money necessitates "doing it the wrong way".
Re:Hmm.... (Score:2)
Yes, What is so hard to believe about that?
Do you think people don't upgrade their systems or implement new ones? Where do you work that you never install new equipment?
and that Linux wouldn't have been fixed quickly (and while there probably are issues that have taken a while to fix, I tend to see patch times that beat competing OSes).
Like I said, it would only have been fixed quickly if it were a major problem affecting every (or most) Linux user. This was a very specific problem that we were the first to run into.
Okay, I'll bite. Short of sheer mass bandwidth that you absolutely require custom hardware for, like a backbone provider, what specific features are you complaining about the lack of?
The ability to handle several hundred ports. Please do not tell me I should buy dozens of shitty $250 Linux boxes full of quad-port ethernet cards to do this.
I am beginning to wonder if you have ever worked in a company with more than 100 employees.
- A.P.
Devolution of the argument (Score:2)
We aren't talking about SANs any more? There isn't a lot of reason to be dropping new OSes or new servers on components of the SAN.
Just as I agreed that there *is* a justification for vertical-market applications, I'm not saying that every copy of AIX should be purged. I just think that items like these are frequently sold in situations where they are not needed. That doesn't mean that they're never needed. I don't claim that Linux is the best alternative if you're using, say, oh, a system that needs to dump process info from the kernel very frequently -- Linux's
I am beginning to wonder if you have ever worked in a company with more than 100 employees.
Well, you're definitely wrong in the literal sense.
Which would explain some of the different focus here -- you're complaining that given a list of options from different providers, no one currently gives you what I'm talking about. My interest is in adding another option to that list -- whether it's possible to create a new option for the prices being talked about.
Re:You're joking, right? Moderators: you too, righ (Score:2)
"Is". It's missing an apostrophe and two consonants.
- A.P.
No geek is an island. (Score:2)
I don't think there's any excuse for asking a question without first doing a little basic research, but here we have somebody who has legitimately never had any experience with terabyte storage asking if there's a cheaper way. It's a legitimate question, and one that probably could not be answered by looking in a book. So the person here is right to ask, and has already gotten some very good answers.
I have a somewhat similar problem: how do I make sure that on the order of a terabyte of audio and video data survives the next hundred years? This given that the disk on which the first 80 gig of this data were delivered to me has two errors that have corrupted two of the files already, and the data isn't even a year old.
What I've been doing is asking other people how they've solved the problem, and also thinking about it on my own. It's how problems get solved. I've gotten some very good and thoughtful answers to my questions already.
Re:Metacomment (Score:2)
These sorts of things are not taught in any school. They are learnt by asking around. At this level of storage (several TB) companies (like EMC, Hitachi, IBM) have high-power sales people who will try to bamboozle and schmooze their way into the sale. Often, you need impartial advice from different sources to make sense of the marketing-speak. This is where the expertise of some of the people (not like you, of course) here at /. can be helpful.
It is no secret that margins are pretty high with such "enterprise-class" storage solutions. The sales people from each of the companies have done their homework well, and know what the competitors products cost, and are sure to charge you as much as they can (it is called collusion, and most vendors do it, thats why noone complains). Therefore, if you are quoted $20M for the whole solution, and you read here at /. that Jane Geek somewhere paid $12M for a similar solution from Hitachi Lightning with options XYZ, you can take that information to the vendors and knock them down a little.
The alternative is to use the services of a professional "shopping" company, like GAPCON [gapcon.com] (I don't have anything to do with GAPCON, I just heard about them recently, thats all).
Gee (Score:2)
Datawarehouse (Score:1)
Re:Datawarehouse (Score:2)
You get the point I hope. $20 million is probably reasonable actually.
Re:Datawarehouse (Score:2)
I included maintenance contracts in my off the cuff tape library price, assuming the selection of the IBM 3584 library and a 5 year life cycle. The 500 tapes is assuming a 2:1 compression ration, not necessarily what is actually achievable.
The other thing I didn't point out is that SAN solutions that are over 5 TB are generally custom solutions architected for a very specific environment. That raises the price because you are now talking about bringing in consultants (like me) who cost you $250 to $350 an hour, and needing hundreds of their hours. You could certainly implement a bunch of NAS boxes with big ATA drives in them for a few hundred thousand. But the system would be so I/O bound that it would be of no use to anyone.
If you take (Score:4, Funny)
Re:If you take (Score:2)
Joe
Re:If you take (Score:2)
OK smarty pants. Why aren't you out there selling these systems?
Because of managers who think like you.
Re:If you take (Score:3, Insightful)
How do you even strap 50 TB together? Is it one huge array, or arrays of arrays?
What do you use at the head end that can handle this sort of throughput? How do you back it up? How do you search it?
What filesystems do you use that support 50TB?
How do you manage the hot swap aspects?
There are so many questions that you leave unanswered, that you might spend $19 mil to answer before you spend $1 mil on hardware.
Joe
Re:If you take (Score:2)
What's this have to do with managers?
Managers are the ones who make the purchase decisions. They tend to buy from large name companies with big marketing budgets regardless of the quality or cost of the solution.
Why don't you sell these systems?
I don't have enough money to market them.
How do you even strap 50 TB together? Is it one huge array, or arrays of arrays?
As with all your questions, depends on the needs of the customer. If you're interested in buying a solution from me, let me know, and we'll talk further.
There are so many questions that you leave unanswered, that you might spend $19 mil to answer before you spend $1 mil on hardware.
No, I won't spend $19 million answering a few simple questions.
Re:If you take (Score:2)
Sorry, I assumed that you meant people managers. If people managers are building systems for your then your company need fixing. What do you call the people that manage people? (Unless I am mistaken and we are both talking about people managers, in which case, what do the rest of you do, if you aren't doing the work?)
As with all your questions, depends on the needs of the customer. If you're interested in buying a solution from me, let me know, and we'll talk further.
Actaully, you have already stated that you could build a 50TB system for $1M, so what more information would you need?
On a more serious side, I am interested in building a dual processor Linux workstation. I do Java/web programming, run VMWare with an instance of W2K connecting to clients via VPN software and and possibly other VMWare instances with Linux as test clients. I constantly have Mozilla, StarOffice, emacs and a couple xterms running. I want to use video conferencing and instance messaging. Can you help me spec a system? Last time I tried I ended up with junk hardware.
Lack of capital (Score:1)
Re:Lack of capital (Score:2)
Explain your customer needs and how you are going to satisfy them and why you need the money now. If it all adds up, I can find you the money.
Capital is never a problem, it is an excuse.
Joe
Re:If you take (Score:1)
That would leave you with a price tag of... $240 million, yikes. Maybe you could get some sort of savings for buying in bulk
Re:If you take (Score:3, Insightful)
What about power? and cooling? Ever cost out one of those huge liebert internal cooling systems? Don't forget you need 2 of them? What about the power.. you'll need huge UPS's for something like this.
How about backups? You'll need to be able to back this all up.. and transport the data offsite in a timely manner. Thats ALOT of DLT tapes, not to mention the costs of the tape libraries, drives, off-site storage facilities (perhaps you'd like to keep all of thos tapes in a locker at the space place? ) etc involved .
Now.. how are you going to access this? with 500 partitions? or perhaps you want some more sophisticated storage management software?
What about support? Are you going to accept responsibility for mainting this thing? or are you.. like most businesses going to want 24x7x4 support? Since support on products like this often involves flying an engineer in from out of state.. on almost no notice.. its not cheap.
The reality of this is that for that kind of storage you need a SAN and that means big dollars. The 2 most commons SANS are EMC (which I'd bet was what this estimate was for) and Compaq storage works. EMC is the more mature solution, but also MUCH more expensive. They often outpace Compaq and the other vendors who make similar products by %300 or more.
Is $20M too much?.. probably. Is any solution involving a room full of servers loaded with commodity IDE drives acceptible.. absolutely not.
Better to shop other EMC vendors, and other SAN solutions and make the best deal on the right product.
Sounds reasonable (Score:1)
From an earlier slashdot story, you can get 300GB hard drives for around $1 a GB. So you are looking at spending $50,000 on hard drives. Figure 4 IDE drives per computer and you need about 50 computers. That would run you maybe $15,000 at around $300 per computer.
I'd say it would need 10 employees to set it up including a couple programmers, a couple sysadmins, and some techs, would probably cost you $200,000 if it took them four months.
I'd say you could do it for less than half a million. Throw in $150,000 a year for facilities and maintence and you have no worries.
Google does something like this. They have tons of cheap computers with cheap hard drives.
Re:Sounds reasonable (Score:5, Insightful)
Where is your failover?
How are you going to connect this disks together? NFS? Samba? That kind of speed (or lack of) is not an enterprise storage solution.
How do you replace disks as they fail without taking stuff offline?
Re:Sounds reasonable (Score:2)
Re:Sounds reasonable (Score:3, Informative)
server only has FS daemons doing I/O, and the drives
are always hot, there is no SCSI advantage as there
is in a multitasking workstation environment.
Re:Sounds reasonable (Score:2)
tagged command queueing, is very shortly about to
render the 300% SCSI price premium obsolete in all
but a few narrow verticals.
Google and Commodity Computers (Score:2)
Re:Sounds reasonable (Score:2)
Now, those 28 drives will need to be attached to something. Maybe an Adaptec SCSI RAID 5400S [adaptec.com], which is a four channel card that can accept up to 60 drives and is priced [nextag.com] at about $900. Add to that a machine to put the RAID card in with at least GB ethernet, at around $6000, 3 40U racks at $2000 each and a UPS for each rack at $2500 each.
All told, that's $67,200 each for drives and arrays, $900 for the SCSI RAID, $6000 for a single box, $6000 for racks, $7500 for UPS's, at a sum total of $154,800 for a single 50TB array. Primary point of failure is the single box running it. For a backup system, running a full second array as redundancy would cost a net $309,600. All of this is not inclusive of labor, which for setup might run easily $100k. Thus, a redundant reliable RAID solution would run you $400,000. All that's once the 320GB IDE drive is released by maxtor.
Does that answer your question?
Please note, this won't be the best array money can buy, just a large array on the cheap. (what RAID was intended for)
Your pricing is a little old (Score:1)
WAY too much (Score:1)
(The 30TB came from IBM.)
Re:WAY too much (Score:1)
Hmmmm... (Score:3, Funny)
Re:Hmmmm... (Score:2)
more input needed (Score:3, Insightful)
forget what you know about ide hard drives (Score:5, Insightful)
forget all that.
if all you wanted was a pile of ide hard drives, maybe this would be ok, but anybody looking for 50TB of storage is not just looking for some disk to hold the pr0n they downloaded last week. large scale storage systems need to manage multiple host access to high speed (15krpm U3SCSI) drives in flexible raid configurations with maximum redundancy, high speed caching (with GBs of RAM to do it), fiber channel switching, cross platform capability, high end management and monitoring, HSM backup and data migration, offsite vaulting of disaster recovery data, power and air conditioning, and a fat service contract from the vendor. none of the above are going to be found at pricewatch.com.
your best bet is to talk to multiple storage vendors about your needs. call up EMC, Hitachi, IBM, and Fujitsu to start, them let them see each other's numbers. With the amount of money that you are going to spend (and it almost certainly will exceed $10 mil - but maybe not $20), each of these vendors will do backflips to get your business (and EMC is particularly good at junkets - take them for all they're worth
Re:forget what you know about ide hard drives (Score:2)
Yeah, but there's also a tendancy to try to sell ridiculously overpriced products with vague promises of reliability or quality. Name brand vendors do it all the time. If the vendor is really so sure that this stuff isn't going to fail, will they pay damages if something does fail in the next seven years? Mmm? I'd assume that such a guarantee, since they're so certain, should cost you a *nominal* amount. If they expect one in ten systems to violate their guarantees (which seems pretty egregious to me), they should only be jacking the price by 10% at most for that guarantee.
Re:forget what you know about ide hard drives (Score:1)
Re:forget what you know about ide hard drives (Score:1)
Re:forget what you know about ide hard drives (Score:2)
Thats actually a rebranded EMC Clariion product that was just released. Saw a demo great machine, but think 3 side by side racks, they stand about 5 feet tall I seem to remember.
Re:forget what you know about ide hard drives (Score:2, Interesting)
What we need is some university/some poor souls with money to invest, to build this as a "test case" for linux distributed systems.
=============
Requirements:
-- 50 TB Data storage
-- 100% availability (I don't mean 99.99_)
-- Data must be accessible worldwide
-- Data must be safe in these events:
-----War or terrorist act (building blows up)
-----Earthquake (building falls down flat)
-----Fire (building burns to foundation)
-----Flood (building full of muddy fishy water)
--Data must be online in the event of a disaster in 48 hours.
--Data must survive:
----Server failure
----Storage medium failure
----telecommunication failure (junk through the pipes)
----Unauthorized access (r0x4H 31g00G)
----Vandalism (maintenance guy with baseball bat or axe)
----Theft of equipment
Furthermore:
--Data must always be in a non-corrupt state
--Data must be fully auditable
--Data transaction must always be fully reversible
Also:
--All procedures (ALL) must be written down on electronic document and on paper and must be available to ONLY the proper personnel.
--All personnel must be correctly trained (development of training material, testing, evaluations, etc)
--System architecture must allow for connectivity to any known server system, any database system, and any client systems.
===
Oh, and under 20 million dollars.
===
However which way that solution should be implemented is left as an exercise to the reader
Read / write (Score:2)
Re:Read / write (Score:2)
Re:Read / write (Score:2)
1) Amount of daily new write access
2) Amount of daily change access
2) Amount of daily read access
Its not going to be "as fast as you can make it" because that is a limitless amount. Heck you can store the whole thing inside of a ram based system have a direct scsi attachment to the clients. Its going to cost way more than $20,000,000 to give a few thousand clients that level of access but it is possible.
Google is your friend (Score:3, Interesting)
I am not an expert in this field, but Google was willing to tell me lots.
RaidWeb [raidweb.com] sells rack mountable RAID units that take IDE drives and have SCSI or fibre connectivity. A 12-bay 4U SCSI (with 12x 120Gb IDE drives) system comes in at just under $8000, giving over 1Tb fault tolerant storage. There are several other companies that have units like this.
Rackmount Solutions [rackmountsolutions.net] sells rackmount cabinets. A 44U cabinet with fans, doors, etc. will come in at around $3000.
In theory, a single cabinet could house 11Tb of data, and cost around $91000. This still doesn't consider cabling, cooling, power distribution, networking, a proper server room (air con, false floor for cables, access control), and in all likelihood one or more controlling servers.
More practically, depending on how they are going to make this data accessible, you could be looking at 9 raid units per cabinet plus 3 2U servers and a switch in the remaining space. Each server can support multiple SCSI cards and gigabyte networking. Such rackmount computers will set you back in the region of $6000 (incl. network and SCSI adapters, excl. software).
So you can call it $100,000 for 9 Tb storage ... $600,000 for 54Tb. That doesn't answer the management software question, and may not be a suitable solution. But it sure is a lot cheaper than $20 mil ;)
Re:Google is your friend (Score:2)
As I noted in my post, my suggestions may be totally unsuitable for the application - but then there was little information given about the nature of use that is expected. I'm glad to hear an opinion from someone experienced with the other end of the scale.
In my (limited) dealings with data warehousing systems, performance has been a non-issue. These systems have mostly held historical data for occasional retrieval. Often the (in)efficiency of the database system has been the bottleneck. We're talking 10 to 50 TPS on simple queries or blocks or data.
As opposed to the solutions offered by today's high end vendors, RAID takes on its true meaning in this sort of slower-moving-data system. Commodity hardware means less stability, but the warehouse is not a mission-critical system (it can take occasional, short outages). On the other hand it is cheap and easy to replace.
Did you try google????? Obviously not. (Score:2)
Sponsered link: raidzone.com
Their 4U 2T system goes for $25K, so 50T would be about $750K and fit in 2 1/2 racks. They claim that they will be doing iSCSI soon, but right now it's just NAS. Still, this is a far cry from $20M. If budget is a concern, you can figure out how to use an array of NAS in place of a SAN.
If you are hell-bent on SCSI or FC, you are going to be into serious dough as SCSI drives are almost 10X the price of IDE at this time, and don't come with as large of capacity (which means that you will need more rackspace, chassis, power, etc.) $20M is probably not too far off. Modern IDE drives with dedicated smart controllers are really not too bad. Just keep a pile of them to swap out bad ones as you are going to be going through drives pretty quick.
With the size of your drive array, backup is going to be a serious issue. You are going to need a multi-drive robotic array of good size. Those are not cheap either.
real world (Score:2, Insightful)
i don't know nearly enough to put such a thing together, but i do know enough to know that every real-world project probably costs 50x what a geek-fantasy basement equivalent would cost.
Pricing sounds a little high (Score:5, Informative)
If you're going with EMC, you'll need to put those disks in something, like a frame (cabinet), and for your size, more like 5 cabinets. With that many cabinets, you'll need some sort of SAN switch and associated fibre cables (not cheap). That gets your disks into cabinets and all hooked together.
You wanted to access the data? Then you'll need EMC fibre channel cards ($15k a pop for the Sun 64bit PCI high end jobs). But you'll more than likely be serving data from a cluster of machines, so count on buying three ($45k) per machine (so each card is on a different I/O board hitting the SAN switch, redundancy)
Who's going to set this up? For that kind of coin, EMC (or whomever you go with) will more than likely set the thing up and burn it in for you on site. The price probably also includes some kind of maintenance contract with turn around time fitting the criticality of the system.
Yes, my 'big ass storage' experience may be limited , but I think that 20Million for 50TB installed/supported/tested by a big storage vendor is in the ballpark.
Good luck.
Re:Pricing sounds a little high (Score:2)
They have to make a living too ya know.
Re:Pricing sounds a little high (Score:4, Informative)
For enterprise-class storage (i.e. this is NOT just a pile of Maxtor IDE drives duct-taped together) paying 20M for 50TB is on the high side, but not by much. (I would have given a range of 10M-20M for the whole thing depending on the exact trade-offs made.)
3 HBAs per host is overkill for most applications (but certainly not all). I've found that two is generally sufficient. Never rely on just one, even for a non-critical system. I'm often amazed at just how critical non-critical servers become when down for several hours in the middle of a busy day.
Don't discount the significant setup and debugging costs at the beginning. This will cost not only in hardware/software/consulting but in time lost for your own admins to spend working with the vendor, going to classes, learning new methods of adding storage, accidently messing up the systems, cleaning up those messes, etc.
Get the best monitoring/management software you can. EMC is famous for gouging people on software costs so you'll need to use your best judgement. (HINT: PowerPath == Veritas DMP at up to 20x the cost. SRDF == Veritas Volume Replicator at up to 20x the price. TimeFinder == Mirroring at up to an infinite multiple of the price. You get the idea-- just use your best judgement and be cautious.) Under extreme single-host disk loads the otherwise minor performance hit for host volume management can become a problem, making that 20x price worth it. Maybe.
If possible, press them for management software that makes adding/removing/changing filesystems a one-step operation, complete with error checking. It really sucks to put that new database on the same disks as another host's old database and software can be really good at checking for stupid human mistakes.
As long as you're here... (Score:2)
I design and build software for a living, including stuff for banks, and I've been trying to imagine a system where I really need 50 TB in one place. Email for 10 million? Customer records for 50 million? A search engine for the entire web? For all of these, my designs would end up like Google: an array of cheap, commodity boxes that each are responsible for a portion of the data.
So is it that there are applications that really require this? Is it that some architects are used to drawing the one single "storage" icon and a $20 million bill isn't enough to make them say, "Gosh, is there a better way to do it?"
Or is it that the sysadmin costs and pain associated with maintaining 25 racks of gear make it worth coughing up for the centralized system in the long run?
Re:As long as you're here... (Score:2)
The main use for such huge storage implementations are data warehouses. In short, companies dump truckloads of data into one spot, then figure out clever ways to extract the data. These are hugely expensive, and the only reason a company would want to spend $20M on it would be either:
a) They think they'll easily make that $20M back and more based on the information they glean. Or
b) They're having a good time pissing away all that venture capital
I suspect that the majority (though by no means all) of the companies doing b) have been elimiated through the viciously Darwinian process that is the world economy. Which leaves a) as the most common reason these days.
Why create a data warehouse?
Probably the best online example of data warehousing in common use is Amazon. Did you ever wonder how they come up with recommendations that seem so eerily accurate? How can they show so many correlations between items? (i.e. not only also bought, but amazingly "also browsed for.") What kind of data do they collect and store to do that?
I bet it's immense. (I.e. store all clickstream data from everyone who has visisted and has a cookie set. How long were they on each page? Which links did they follow? What were the properties of that link (color, screen position, etc.) ) The possibilities are endless.
Now that all this data has been collected, they need to index all this data for easier retrieval, store summaries for quick searches, details for thorough searches, and add in a couple of development/test environments with their own storage.
Now do the same thing, only it's dozens or hundreds of different reports to sell back to their suppliers and advertisers about what was sold quickly and why. Anyone selling a product will pay very handsomely for information about how they can sell more of them.
You get the idea. It takes a huge amount of data, but it can generate huge amounts of revenue.
Since the actual reports run on a data warehouse are so unpredictable (finding what's valuable is largely trial and error), it's easier in the long run to just stick with widely used tools and infrastructure. (I.e. Oracle on UNIX.) Rather than having to support an entire development staff to build the infrastructure, a company can spend a bit more and have it done for them. (This is pretty counter to the whole hacker mentality, but this is how most companies work.
So while it may seem expensive, sometimes it's worth it.
Re:As long as you're here... (Score:2)
I guess the factor I was forgetting is that most large companies have terrible records developing software. Ergo, what seems obvious to me (that doing it Google-style will save money and get better results) must seem awfully scary to them.
That sounds like the pricing for a whole project (Score:2)
That's like $6MM for most customers.
Fibre channel directors and switches
Tape robot... $1 MM
Storage Mgmt software like TSM... $400,000
The extra $10MM is probaly for full-time consultants, a more expensive solution like EMC or a more fault-tolerant solution.
Re:That sounds like the pricing for a whole projec (Score:2)
multiple sites - SAN etc (Score:2)
Also accessing this amount of data at reasonable high rates is expensive, think Storagetek silos, HDS SAN's etc etc. All this is highend very very fast stuff.
If you've got 50 TB of data running in an OLAP cube you've got to have massive IO capability to properly load and spin the cube around. Ie the cost ain't in the actual storage media, but the IO (esp if you've got a split system requiring multi-site system).
There should be plenty of examples of this sort of data storage now - telcos to web logs. Pricing, well depends on the deal you can get at the time...
You pay for support. (Score:3, Interesting)
Don't forget that the hardware isn't cheap: Frame, multiple redundant hot swappable power supplies (requires specialty power connection), dozens of scsi drives, dozens of scsi controllers, 10-20 fibre channel connections, an interconnection network between FC and SCSI controllers that includes fiber and copper ethernet, hubs, etc., and a management x86 laptop integrated into the frame.
$20 mil for this is a fair price in my opinion. Anyone who rolls their own is just insane. There are hundreds of engineers behind each of these boxes, and it shows.
No, I don't work for EMC.
I know how. (Score:4, Funny)
Floppies. Lots and lots of floppies. They are so cheap right now! And the come in pretty colors too.
How to actually use cheap computers... (Score:2)
IDE-RAID with 3ware 7500-12 controllers and 3U 14-bay cases (available from rackmountpro, and probably others) could be one possibility, but I don't think you would get a 'flat' storage-space from it, probably have to be segmented instead. As others have pointed out NFS/Samba aren't really manageable ways to handle a filesystem spread amongst multiple machines. People who do this, like archive.org and google, have custom software to access the data stored on their machines. But it doesn't have to be that way forever...
I think iSCSI could give very interesting possibilities for open-source SANs using this type of hardware...maybe front-end servers which map requests as necessary to back-end servers holding the storage, you could have a rather nice fully-resilient highly-scalable system that way, which would just appear as another drive to a client machine, no NFS/SMB etc...
Cost (Score:1)
add $19,900,000 for consulting fees and you've got your 20 million. Speaking as a consultant, that seems reasonable to me.
CDs - Obvous choice (Score:3, Funny)
Try EMC on eBay (Score:2)
Searching eBay for EMC provided some interesting results (these are mostly "buy it now" prices):
EMC Symmetrix 3930 w/ 12 TeraBytes [ebay.com] = $57K
(With the proper drive configuration, this unit should [emc.com] be able to deliver up to 70TB in a single system).
This one comes with 12TB of storage (256x50GB HD's). If you throw out all 256 of those 50GB HD's (or just give them to me as a consulting fee for saving your company over $19.5 million) and buy 256X181 GB HD's, you're just short of you 50 TB mark (~46,336 GB).
On Pricewatch [pricewatch.com] those drives come out at $999 ea x 256=$255,744.00 add the initial $57K and you've got a machine that meets your specification significantly less than $20mil
Here are some other EMC machines for sale on eBay:
EMC Symmetrix 3830-36 With 3 TB No Reserve! [ebay.com] = $59K
EMC Symmetrix 3700 6TB w/Install & 1YR Mnt! [ebay.com] = $48K
EMC Symmetrix 5700 3TB Storage System [ebay.com] = $9K
This is what I found by doing minimal research. I'm not 100% sure that the Symmetrix 3930 can handle that configuration (its not my money) so before you go down this road -- do your research (better than I did).
Re:Try EMC on eBay (Score:3, Insightful)
The EMC boxes(or anyone else for that matter) have a significant amount of configuration associated with connecting the drives. You cant just open the Box up and start sticking in drives and expect it to work. For that matter, in many cases if the drives are not the ones rated for use in the box you can destroy the backplane of the machine. The power supplies, the drives themselves, etc...Power and heat are huge issues in these boxes...think of the heat the average hard drive throws off now put 100+ in a box the size of the average home refrigerator...
Then there are configuration issues, you need the software and the technical know how to write the configuration files these machine use to tell the multiple drives to act as one or many logical drives.
Then how do you connect the system(s) that will use the box up. These are all delicate issues.
If you buy a box off Ebay you will absolutely need someone working for you who knows the product inside and out(or at least on a retainer contract with 24x7 support clauses)...and you should immediately make a phone call to the proper support phone number to get the thing on a support contract...Trained EMC professionals don't come cheap, but they are worth every penny, I would assume that other companies its the same story, but I only use EMC so I don't know...
Buy EMC its really the only long term option, I have seen one of these boxes get knocked over on its side(no small task) while it was running, and just keep going with out a hitch...thats a well engineered product....
Re:Try EMC on eBay (Score:2)
Turkeybrain, maybe, but you're coming off as an arrogant prick! But I'll extend you the courtesy that you didn't bother to extend to me by giving you the benefit of the doubt.
Did you bother reading my entire post? Did you read the part where I stated that this was undisputable fact and that anyone with a problem with it is just plain wrong? That's right -- I didn't say that. In fact, I said that I did minimal research.
As far as finding a tech who knows EMC -- it shouldn't be more than $100-125K/yr full time-- and in this economy, they're out there for the hiring. Add in an extra $200 premium ea for those being EMC friendly drives, and you're up $51K or so. Am I getting any warmer? Still a hair under $20,000,000, right?
BTW -- why is it that because you're on slashdot you think you can get away with talking to me like that? If you walked up to me on the street and pulled that, I'd pop you right in the nose. Thanks for your extensive EMC knowledge, Junior.
Re:Try EMC on eBay (Score:2)
Re:Try EMC on eBay (Score:2)
Its fine to point out where someone is wrong -- and I'm nore than willing to hear it and discuss it. However, its totally inappropriate to say rude shit and get in my face about it -- I guess that you still don't understand that. Too bad for you.
Re:Try EMC on eBay (Score:2)
Re:Try EMC on eBay (Score:2)
Now that you mention it -- I don't know. I've used EMC, NetApp, and BlueArc -- but EMC was the first thing that came to mind...and I remember seeing EMC machines for sale on eBay before. I didn't have time to do extensive shopping/research, so I went with the lowest common denominator.
Re:Try EMC on eBay (Score:2)
1. Check out the auction, it says that the system is suitable for re-certificaton:
I've seen other EMC systems on eBay advertised with a full EMC warranty.2. If I hire an engineer full-time (hell -- for that money, a group of engineers, 4 at $100K annually over a 10-year project -- still cheaper than the $20M), do I need an EMC support contract? Do they need to come and fire the box up for my engineering/administrative group?
Thanks for the info though...
Past Experience... (Score:2, Insightful)
Keep in mind, that's just for the disks, array controllers/cabinets, hubs, and Sun FC cards. No servers are included in that price.
There are so many variables that you didn't go into that it's hard to give you an educated answer to your question, but it seems feasible to get to around 50TB today for that kind of money taking into account the increased storage density that we've gotten in the last couple of years.
Alternative - the Sun Solution (Score:1)
Re: (Score:2)
Depends (Score:2)
The big factors in storage cost, breifly:
r) Reliability
s) Speed
c) Cost
In rough terms, c=s*r, meaning the cost will rise dramatically for high speed reliable storage versus low speed crap storage.
In addition, how the storage is designed (and how much more it can cost) depends a lot on data access patterns as well (read-mostly vs write-mostly, oltp vs dss vs datawarehouse vs
Maxtor has 0.3TB IDE for $1/GB. If you built a huge array of IDE controllers for these, your disk cost for 50TB would be around $50k. If some vendor actually built a beast with the requisute number of IDE busses and whatnot, the chassis might run you another $100k. All in all, real cheap storage. But it would suck on performance and reliability, put out too much heat and noise probably, etc, etc.
Highly available disk arrays with extreme disk platter performance and large amounts of caching can easily run $20 million for 50TB, if not more. There are middle of the road solutions though, it doesn't have to be that expensive unless you're going all out for huge concurrency and speed in an OLTP environment that requires 99.999999% uptime.
Re:Depends (Score:3, Funny)
HSM (Hierarchical Storage Management) (Score:2)
Let's do it in RAM, or not(warning, rambles a bit) (Score:2)
So, would IDE really be that bad? Wouldn't it be better to put together a Beowolf cluster of smaller databases, each tasked with a portion of a search? Intelligent distributed processing is a much faster way to do a query of a database. If you have some large (but not unmanagable) number of notes, lets say 50 (one per terabyte), with backup nodes extending it to 64, any few failures would be correctable at full load.
I know that I don't have the skillset to put together the 50 Terabyte database right now, but I really believe that I could do it in less than 1 year, with half the budget, assuming free telecom to backup sites.
--Mike--
What Kind of Application? (Score:2, Informative)
Are you mostly reading, or also frequently writing this data? Are you searching or doing indexed lookups? Is this a nasty bandwidth hog or a trickle? Is this a zillion parallel transactions or only a few users? What kind of latencies are expected? What reliability is required? What access is needed to historical data?
Consider some concrete examples that are *very* different from each other yet could each total 50TB and would have very different solutions:
- Video-on-demand system for a Hollywood studio deciding that peer-to-peer pirate systems can only be beaten by a legitimate system that is better.
- Online credit card transaction system for, say, Visa.
- SETI data that needs to be collected and searched for messages from extraterrestrials.
- Particle accelerator data that needs to be collected at truly horrendous rates.
- Lexis/Nexis database.
- Google database.
- Echelon data.
- IRS data.
- "Dictionary attack" database for a lone cryto-analyst.
The possibilities go on and on. At the minimum a 50 TB database might be a small number of equipment racks with a single computer attached to them, all totaling maybe $100,000.
And on the other end, I can easily imagine a system where $200,000 of a much larger total might be spent for, say, a terabyte of DRAM.
I can easily imagine a system with less than $5,000 of battery backed up power supplies, and I can imagine a system with hundreds of throusands in generators.
This question has enormous dynamic range.
-kb, the Kent who would enjoy working out solutions for specific instances of this question.
Yes, it's perfectly reasonable. (Score:2)
We have a somewhat-smaller situation at work, with a single Hitachi Lightning SAN providing our data warehouse nodes (two IBM p-series 680 servers) with a terabyte or so of fully-redundant fiber-connected disk. A single terabyte cost us nearly $750,000, and Hitachi bid competitively.
Enterprise-class solutions call for enterprise-sized wallets. Do not expect to slap together a few IDE drives and call it a day, unless you enjoy being fired.
- A.P.