Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Intel's Nehalem EX To Gain Error Correction

timothy posted more than 5 years ago | from the no-it-won't-yes-it-will dept.

Intel 80

angry tapir writes "Intel's eight-core Nehalem EX server processor will include a technology derived from its high-end Itanium chips that helps to reduce data corruption and ensure reliable server performance. The processor will include an error correction feature called MCA Recovery, which will detect and fix errors that could otherwise cause systems to crash — it will be able to detect system errors originating in the CPU or system memory and work with the operating system to correct them." Update: 05/27 19:11 GMT by T : Dave Altavilla suggests also Hot Hardware's coverage of the new chip, which includes quite a bit more information.

cancel ×

80 comments

Sorry! There are no comments related to the filter you selected.

ECC memory replacement? (1)

pak9rabid (1011935) | more than 5 years ago | (#28113023)

So, is this an effective replacement for ECC memory?

Re:ECC memory replacement? (4, Informative)

Anonymous Coward | more than 5 years ago | (#28113153)

This will fix many errors affecting the processor itself (new manufacturing processes make transistors quite vulnerable to interference and aging). ECC will still be needed for correcting errors affecting data while it is stored in main memory.

Parity will be needed for protecting caches (possibly ECC will be used in the future). Checksums for data on the hard drive. CRCs for packets on the network. And so on...

Re:ECC memory replacement? (4, Insightful)

davecb (6526) | more than 5 years ago | (#28113521)

I'm a bit surprised this is only seeing the light now: as we get smaller and faster, the number of errors observed goes up amazingly.

Back in the stone age, Cray computers didn't even have parity memory, partly because they were willing to re-run programs but mostly because errors were unlikely. Cray himself famously said "parity is for farmers".

These days, errors are very common, and I'm literally amazed that x86s don't have better-than-ECC error detection and correction. All the commercial Unix vendors have them.

--dave

Re:ECC memory replacement? (4, Informative)

Jah-Wren Ryel (80510) | more than 5 years ago | (#28113675)

These days, errors are very common, and I'm literally amazed that x86s don't have better-than-ECC error detection and correction. All the commercial Unix vendors have them.

Intel's been trying to 'protect' the market for itanium - those cpus have had it for years, probably from day 1. HP definitely markets MCA has a big feature of their itanium based systems.

If AMD were smart, they would have incorporated it into their Opteron line just like they did x64 to cut Intel off at the knees.

Re:ECC memory replacement? (1)

davecb (6526) | more than 5 years ago | (#28113987)

Indeed! Intel is being penny-wise and pound-foolish.

--dave

Re:ECC memory replacement? (1)

mzs (595629) | more than 5 years ago | (#28115771)

MCA is basically a buzzword for a bunch of RAS features. AMD has most of the same that Intel have in x86. (Itanium does have some wild stuff that will most likely not be in Nehalem though, like the TLB verification.) Some of the details are different in AMD vs Intel though and an OS needs to know about that. One exciting thing is that AMD has just started making chips with ECC L1 caches while as far as I know Intel still have only parity detection in the L1 cache. Also some AMD cpus have hardware memory scrubbers (3) while with Intel you need a hefty chipset to do that automatically (again I hope that is not dated info).

Re:ECC memory replacement? (1)

mzs (595629) | more than 5 years ago | (#28116669)

I hate to reply to myself, but I did some googling and I cannot verify that AMD K10 has ECC L1 cache. I am almost 100% certain I read that is a reliable place like an AMD white paper some time ago though. I hope someone can clear this up for me.

Re:ECC memory replacement? (1)

lopgok (871111) | more than 5 years ago | (#28221893)

the first cray 1 didn't have ecc. the mtbf was 8 hours.
all crays since that have had ecc. seymore cray was a smart dude.

Re:ECC memory replacement? (2, Informative)

mzs (595629) | more than 5 years ago | (#28113831)

State of the non-mainframe art with regards to RAS right now is ECC RAM with mirroring, parity cache, ECC e-cache, hashes that detect and fix multiple bit errors for storage end to end, CRC (ethernet) and cksum (TCP, UDP) (but can you trust the nic offloading engine?), instruction retry, and fp scrubbing, in addition to what has been around for the last five years or so.

Re:ECC memory replacement? (1)

afidel (530433) | more than 5 years ago | (#28114305)

Add ANSI T10-DIF for storage to application data validation.

Re:ECC memory replacement? (1)

Chris Burke (6130) | more than 5 years ago | (#28114487)

Parity will be needed for protecting caches (possibly ECC will be used in the future).

Just fyi, they all have ECC for caches already.

Re:ECC memory replacement? (1)

mzs (595629) | more than 5 years ago | (#28116107)

Not L1 though. As far as I know you cannot yet buy anything from Intel that has better than parity error detection in the L1 cache. AMD just started selling chips that have ECC L1. Actually I just looked and I cannot find a good doc stating that AMD K10 has ECC L1 cache, but I am almost 100% certain I read it. In any case I know you can buy stuff from Sun that has sparcs that have ECC for the register file yet only parity error detection for the L1 cache.

Re:ECC memory replacement? (2, Informative)

Chris Burke (6130) | more than 5 years ago | (#28116795)

The original Opteron had L1 ECC, it just wasn't correctable if encountered on a read or write (there was a scrubber that would find and correct ECC errors, but if it didn't reach the line in question before the program accessed the cache line, then it would detect the error and machine check fault). The ill-fated Barcelona (Phenom) added on-the-fly correctability. Phenom 2 of course has it too.

I was pretty sure Intel had it in their L1s too. Kinda surprised to hear SPARC doesn't.

P.S. I know The Inquirer decided it was the K10, but it isn't. They're still all K8s.

Re:ECC memory replacement? (2, Interesting)

mzs (595629) | more than 5 years ago | (#28117109)

Sure enough it is in the Phenom datasheet, thank you.

As far as I know T1, T2, and T2+ all have only parity for the I$ and D$. All the Fujitsu sparcs that I know of only have parity for I$ and D$ as well. ECC e-cache is the norm though.

Sparc was odd. They had all sorts of strange caches from one model to the next. Sometimes there was an I$ and D$, sometimes it was unified. Sometimes some caches were virtually tagged. There was an ultrasparc that had the e-cache data ECC protected and the tags were on chip and only had parity checking. Also there were bad modules with flakey E$ at one point. Sun provided customers that had problems that the enhanced RAS in Solaris 9 did not solve with replacement modules with mirrored SRAM for the e-cache.

Re:ECC memory replacement? (1)

Chris Burke (6130) | more than 5 years ago | (#28117297)

Hm, stupid question, but what's an e-cache? Oh wait, external as in off-chip. That makes sense in context. You'd use parity-only for on-chip caches to save bits, but for a separate cache chip it'd be silly to save a small percent of space and lose correctability/multi-bit error detection.

Sparc is odd. I don't know all that much about it, but the more I learn the more I think so. Thanks. :)

Re:ECC memory replacement? (1)

mzs (595629) | more than 5 years ago | (#28127557)

There is another reason for parity on chip, the parity of AB is the same as the parity of A + parity of B, while ECC is inherently serial per each block. That way each parity check can easily be 4, 8, or 16 times faster than an ECC check. Sun always went the extra mile to make their caches a bit faster than the competition. Sun had a history of making I$ and D$ and tag comparison on E$ with only 0 or 1 cycle penalty.

Re:ECC memory replacement? (2, Informative)

Anonymous Coward | more than 5 years ago | (#28113207)

No. ECC only corrects certain issues in the memory. It cannot help with memory controller errors, nor with register or TLB errors.

Error (0, Redundant)

jbeaupre (752124) | more than 5 years ago | (#28113033)

MS detected
Uninstalling

Re:Error (-1, Troll)

Anonymous Coward | more than 5 years ago | (#28113061)

AHAHAHAHAhahahahahahahah HILARIOUS!!!!!!

Re:Error (-1, Troll)

Anonymous Coward | more than 5 years ago | (#28113249)

AHAHAHAHAhahahahahahahah NIGGERS!!!!!!

Re:Error (0, Offtopic)

MrNaz (730548) | more than 5 years ago | (#28113269)

I like this trend for computer systems to protect us from our own stupidity. Here's another one:
http://static.mrnaz.com/ms_security_works.jpg [mrnaz.com]

For Example (0)

Anonymous Coward | more than 5 years ago | (#28113051)

Floating Points Errors?

Re:For Example (1)

sexconker (1179573) | more than 5 years ago | (#28113297)

Game Set and Match.

Not nearly good enough... (5, Funny)

fuzzyfuzzyfungus (1223518) | more than 5 years ago | (#28113075)

This is nice and all; but I, for one, will not be satisfied until Intel releases a CPU that does what I mean, not what I say.

Re:Not nearly good enough... (1)

ausekilis (1513635) | more than 5 years ago | (#28113217)

Tcl already has a tip (rfc-like) in place to handle such feats: rmmadwim, April 2003 [www.tcl.tk]

I know, I know... Tcl is a scripting language, not an OS, not a processor, yadda yadda. At least someone is thinking ahead here. If they can get it to work in a scripting language, they may be able to get it to work at lower levels..

Re:Not nearly good enough... (1)

keeegan (1526067) | more than 5 years ago | (#28113225)

Why don't you just say what you mean, for the time being?

Re:Not nearly good enough... (2, Funny)

causality (777677) | more than 5 years ago | (#28113271)

Why don't you just say what you mean, for the time being?

That's too easy. We'll never advance the state of the art with that kind of thinking!

Re:Not nearly good enough... (1)

TeknoHog (164938) | more than 5 years ago | (#28113723)

We'll never advance the state of the art with that kind of saying!

Re:Not nearly good enough... (2, Funny)

dzfoo (772245) | more than 5 years ago | (#28113313)

So does your computer currently do what you say?

Mine, I can barely get it to do what I type!

        -dZ.

Re:Not nearly good enough... (1)

clone53421 (1310749) | more than 5 years ago | (#28114891)

Yeah... no matter how many times I try, typing "stand on your head" doesn't seem to have any effect on it. It just gives me the same dumb excuses over and over.

Re:Not nearly good enough... (1)

jonaskoelker (922170) | more than 5 years ago | (#28120367)

releases a CPU that does what I mean, not what I say.

They have that, it's called a Girlfriend.

However, it has the freedom to decide what you mean however it wants. Beware of the "dilf, itd" instruction---it's short for "do I look fat in this dress?"

x86 (2, Insightful)

Red Flayer (890720) | more than 5 years ago | (#28113091)

Error correction on an x86 chip?

Sweet. Now all those high-end server applications running on x86s that need great uptime can finally join the big boys. [rolls eyes].

I'm just not sure of the utility here -- I RTFA, but I'm still not clear on why Intel would cannibalize Itanium sales (new release delayed again) by offering error correction on Nehalem chips. Is the demand for x86 Server chips that high? I thought anyone requiring 5 nines (or anything close to it) would never consider using x86?

Can someone with more knowledge of the high-end server market please clarify?

Re:x86 (3, Insightful)

Amouth (879122) | more than 5 years ago | (#28113181)

thats it.. i don't think this is aimed at the "high end" but rather at the middle ground..

people running farms or VM's or even large DB's but not exactly in need of mainframe or HPC.

while i agree there are alot of options other than x86.. x86 is growing and isn't going to go away.. and the EMT64 has just solidified it.. adding something like this is a welcomed evolution of the area.

and they arn't canabilizing the Itanium sales - while yes the Itanium is selling better than before.. there is no where near the market for it as x86 chips.

Re:x86 (2, Insightful)

0x000000 (841725) | more than 5 years ago | (#28113189)

The more interesting thing is to see how this technology is going to work and whether other manufacturers will be able to implement this in their chips.

x86 is slow and under performing architecture, and I am surprise that Intel is bolting error correction on top of it. The Intel instruction set is so complicated that often times a single bit being flipped means it is still a very much valid opcode which when executed will do something completely different from what you expect it to do.

This seems to be nothing short of a stopgap measure for not losing more customers to the big iron manufacturers like Sun and IBM who both have their own CPU's that were built with stability in mind.

x86 has moved into areas where it simply is not going to shine as brilliantly as it did on the desktop. The only issue is that moving to a new platform is going to be catastrophic in that too many people rely on it. Apple being able to transition from PowerPC to x86 is quite a feat, but x86 transitioning to the next big thing is going to be impossible without at least backwards compatibility in the form of x86 emulation, and boy is the x86 instruction set fun to emulate!

Re:x86 (1)

0x000000 (841725) | more than 5 years ago | (#28113221)

Err, I meant x86 instruction set, not Intel instruction set.

Re:x86 (1)

sexconker (1179573) | more than 5 years ago | (#28113419)

If only your user name was 0x0CD7FD.

Re:x86 (5, Informative)

Chris Burke (6130) | more than 5 years ago | (#28114145)

x86 is slow and under performing architecture, and I am surprise that Intel is bolting error correction on top of it.

Hogwash. There's nothing inherently slow about x86. The ISA is nothing but an interface. Internally, the CISC instructions are decoded into simple micro-ops, so all the predictions about how x86 would fall behind because it wouldn't be able to have out of order execution etc were proven wrong. It's not easy to make x86 chips, but the difficult performance problems have been solved.

So don't be surprised, it's just another step in the plain obvious trend that has been going on for over a decade now. With no performance disadvantage, and a big price advantage, x86 has been moving into the server market in a big way. The only thing holding it back is the lack of RAS features, which are just as easy to "bolt on" to x86 as any other instruction set. It's just there was no reason to add these features for desktop or low-end servers.

The Intel instruction set is so complicated that often times a single bit being flipped means it is still a very much valid opcode which when executed will do something completely different from what you expect it to do.

The same is true of RISC, flip a bit in the opcode field and there's a good chance it's still a valid opcode. Not that it matters one whit; flipped bits in the instruction stream are detected via ECC in the instruction cache, not by praying the decoders see it as an invalid instruction.

This seems to be nothing short of a stopgap measure for not losing more customers to the big iron manufacturers like Sun and IBM who both have their own CPU's that were built with stability in mind.

FUD like this is nothing but a stopgap measure for the RISC vendors to lose customers a little more slowly to x86 than they already are. Of course rather than just losing customers, Sun and IBM (and other former RISC vendors) sell solutions that use x86. It's only a matter of time before this trend hits even the "big iron". As x86 erodes their margins from beneath, for how long will it make sense to spend the money to develop the RISC chips for an ever-decreasing slice of the pie? Eventually it makes more sense to just demand that Intel add whatever RAS features it lacks compared to the RISC chip it'll be replacing, which is exactly what is happening here (only in this case it's EPIC that's on the chopping block).

Apple being able to transition from PowerPC to x86 is quite a feat, but x86 transitioning to the next big thing is going to be impossible without at least backwards compatibility in the form of x86 emulation, and boy is the x86 instruction set fun to emulate!

Well you certainly got that right. The only real disadvantage of x86 itself is that it is a huge pain in the ass to make work properly, and a lot of the magic isn't in the ISA docs but rather in the institutional knowledge of the two remaining firms that make the chips. x86 raises the already incredibly high barrier to entry for new chip manufacturers. That, not performance or (potential) reliability, is the reason x86 sucks.

Re:x86 (0)

Anonymous Coward | more than 5 years ago | (#28154051)

Well you certainly got that right. The only real disadvantage of x86 itself is that it is a huge pain in the ass to make work properly, and a lot of the magic isn't in the ISA docs but rather in the institutional knowledge of the two remaining firms that make the chips. x86 raises the already incredibly high barrier to entry for new chip manufacturers. That, not performance or (potential) reliability, is the reason x86 sucks.

For example, the P6 hinting NOPs (opcode 0F 18 to opcode 0F 1F) was introduced with the Pentium Pro in 1995, yet only recently it is becoming well known. In fact, there are x86 CPUs that was intended to be 686-compatible but forgot to implement support for these hinting NOPs, such as the VIA Nehemiah.

Re:x86 (0)

Anonymous Coward | more than 5 years ago | (#28114765)

x86 is slow? You've got to be kidding me.

Re:x86 (0)

Anonymous Coward | more than 5 years ago | (#28115203)

Yes and that's why the top 3 supercomputers in the world are x86 based...

The fastest supercomputer, built by IBM no less, is x86 (Opteron) based so there goes your theory on "big iron" manufacturers.

Re:x86 (1)

mzs (595629) | more than 5 years ago | (#28115571)

I don't know about x86 being slow. There is some Power that is very fast at single thread int and fp, but man is it power hungry, hot, and expensive. But really for many workloads x86 is plenty fast and priced much more competitively. Certainly x86 is faster than all sparc, mips (dead), ppc (dead), itanium (living dead), and all but the most expensive power chips with regard to single threaded MIPS and FLOPS. Most workloads are IO or memory bound or the throughput of largely independent tasks can scale by adding HW to the cluster, hence cheap fast enough x86 looks very appealing.

Re:x86 (4, Insightful)

Anonymous Coward | more than 5 years ago | (#28115875)

x86 is slow and under performing architecture

So right there you've destroyed your credibility. You couldn't be any more wrong if your name was W. Wrongy Wrongenstein.

Right now, x86 processors are the highest performance in the world.

and I am surprise that Intel is bolting error correction on top of it

Well, that just shows you aren't paying attention to the trends of where x86 is going any more than you've been paying attention to its performance. x86 has been gradually moving up market into higher and higher tiers of servers for well over a decade now.

The Intel instruction set is so complicated that often times a single bit being flipped means it is still a very much valid opcode which when executed will do something completely different from what you expect it to do.

And now we see that you don't have much clue about instruction set encoding, either.

There is literally no commercially viable instruction set for which the above is NOT true. Look at a traditional RISC instruction set with 3 operands and 32 GPRs. Almost half of the bits (15 of them) in every 32-bit ALU instruction for such a processor are register addresses. Flip any of those bits and the register address is still valid -- there are no invalid addresses, so the processor can't tell the difference between the wrong address and the right one. The remainder of the bits in such an instruction are typically instruction format select, opcode select, and miscellaneous control bits. Flip an opcode bit and you'll get the wrong ALU op, more often than not... processor designers leave some room for adding opcodes, but typically not a lot.

See, the only way an instruction set can guard against bit flips is not by simplicity (as you implicitly claim), it's by being horribly wasteful. When people design instruction encodings, they look at the width of all the bit fields in each instruction format and use the smallest they can get away with. Instruction sets which aren't efficiently packed aren't any good: they use more memory to store program code, have reduced effective icache size for the same number of bits in silicon, tend to have major clumsiness (such as too-small immediate operand sizes, or too-small relative branch windows),and so forth. Efficient packing always means there are very few invalid bit patterns for each field in the instruction; if you have a lot of invalid patterns you probably could be packing the instruction tighter. Few invalid patterns means that most bit flips still produce a valid instruction.

This seems to be nothing short of a stopgap measure for not losing more customers to the big iron manufacturers like Sun and IBM who both have their own CPU's that were built with stability in mind.

Idiot. Intel isn't losing big iron marketshare to IBM and Sun. It's taking big iron marketshare from them. Adding big iron RAS features to x86 is the next step in that trend.

x86 has moved into areas where it simply is not going to shine as brilliantly as it did on the desktop. The only issue is that moving to a new platform is going to be catastrophic in that too many people rely on it. Apple being able to transition from PowerPC to x86 is quite a feat, but x86 transitioning to the next big thing is going to be impossible without at least backwards compatibility in the form of x86 emulation, and boy is the x86 instruction set fun to emulate!

1990 called, and it wants its foolish predictions of where x86 cannot go back.

Much better informed people than you thought, back then, that x86 could never be a workstation or server CPU in any capacity at all. It was just a personal computer processor, and a rather ugly and slow one at that.

Instead, Intel proved they could make fast x86 processors, and steadily increased x86 presence in the workstation and low end server market throughout the 90s, with an assist from AMD in recent years. x86 now dominates those markets.

Is x86 a perfect instruction set design? Of course not, it's ugly as hell. Does that mean it literally cannot be adapted to the big iron world? No, of course not. Most of the RAS features (Reliability, Availability, Serviceability) big iron needs aren't tightly coupled to the actual instruction set used. Your notion that x86 cannot enter this market without relying on emulation (presumably by a 'better' instruction set processor) is horribly naive and completely out of touch with reality.

Re:x86 (1, Insightful)

Anonymous Coward | more than 5 years ago | (#28116661)

Impressive effort, but slashdot is full of dotcom washouts whose IT knowledge ends roughly in 1997. You'll never educate them, so it's more fun to point out how Linux users are obsolete relics, colossal morons, and sub-msce bottom feeders.

Re:x86 (1)

RightSaidFred99 (874576) | more than 5 years ago | (#28117113)

I know, all those much better performing than Nehalem/Opteron chips are really crushing x86. Wait, there are barely any of those. Sorry, I didn't realize quite how full of shit you were.

Re:x86 (1)

Slashcrap (869349) | more than 5 years ago | (#28120397)

x86 is slow and under performing architecture

Yeah, nearly as slow and under performing as SPARC, PPC, MIPS, IA64 and ARM.

Not quite though, not quite.

Re:x86 (0)

Anonymous Coward | more than 5 years ago | (#28113223)

Probably because Itanium is a dead end technology. Current x86/x64 CPUs offer far superior performance for less power and cost.

Re:x86 (4, Insightful)

Lally Singh (3427) | more than 5 years ago | (#28113233)

They're not, nobody buys Itanium. They're going after SPARC and POWER. Lots of people are looking at the speed and throughput of modern x86 and noticing the price difference. Especially in this economy.

And with Ellison in control of SPARC, it's the best way to go.

Re:x86 (1)

afidel (530433) | more than 5 years ago | (#28114799)

Lots of people buy Itanium, they buy them from HP in fairly large quantity. The latest numbers I can find are $713M in Itanium revenue for HP in Q1/2009, out of a total revenue of $3.9B for all of their storage and server group. Sun only had $1.094B in TOTAL server revenue for the quarter which is not broken down between SPARC and x64 lines. IBM's numbers for POWER are almost impossible to tease out of their filings.

Re:x86 (1)

mzs (595629) | more than 5 years ago | (#28115345)

The problem is that HP is expensive compared to Sun. Seriously, I know it is hard to believe, but call them up and check for yourself. The reason that HP gets away with this is because they have some people by the balls right now. There were some shops that bit on Itanic when it looked unstoppable with support from MS, SGI, and Compaq/HP. Those shops are in a bind at this point. Then there are the people that were DEC. HP offered some crazy storage paths from that to Itanic that some places bought on to. They are in not a good place now either. So not many Itanium servers get sold per year, but they are crazy expensive. The scale is like this 8 cpu Itanium system with IO channels (PCI/PCIe/PCI-X) in the dozens costs about the same as a 64 cpu system with IO in the hundreds from Sun.

Re:x86 (1)

afidel (530433) | more than 5 years ago | (#28115761)

All of the top slots (Price/QphH) for TPC-H for large systems are owned by Itanium. Unfortunately SPEC doesn't require pricing so you can't compare Specjbb2005 results that way. If you filter for large systems in TPC-C and then sort by Price/TpmC Superdome does quite well.

Re:x86 (1)

mzs (595629) | more than 5 years ago | (#28116599)

http://www.tpc.org/tpch/results/tpch_price_perf_results.asp [tpc.org]

So basically Superdome is not in the top spot (even not the top not clustered spot) for anything until you get to 30TB, where it is the ONLY entry. Now true I will give you that it was a unisys ms sql box that topped at 3TB that no one would really buy (and it does not do much in the realm of QphH), but that was a Xeon box, as were the top few in every other category always Xeon or Opteron, no Itanium. In fact pretty much everywhere below 3TB there were even Xeon and Opteron boxes with better Price/QphH and QphH than HP Superdome. (You do get some other oddballs that no one would really buy though too.)

I think it does show that HP Itanium is very expensive unless you need to push upwards of 200K QphH on insane data sizes in a single box. There are other systems that can do more QphH far cheaper as long as you limit yourself to 1TB. When you look at simply QphH then again the only time that Itanium is on top is the one where it is the sole entry. A power6 beats it in the 3TB category, can push more than 50% more QphH but only costs about 20% more and is running software that people would really use. It is a cluster though.

Re:x86 (2, Insightful)

bloodhawk (813939) | more than 5 years ago | (#28117097)

I am working in an organisation that found themselves unlucky enough to fall for the HP Bullshit about how wonderful itanium was, We spent close to 2 million on high end itanium boxes over the past few years, We have now classified ALL of them as up for asset replacement so we can get rid of the bastards as early as possible (2 years before normal end of life for us). So many vendors simply don't have software that works on Itanium or works in a more limited cut down way, hell even MS which supposedly supports them doesn't have most of there software as itanium compatible.

Re:x86 (1)

Kjella (173770) | more than 5 years ago | (#28113367)

Virtualization. It's pretty clear from the Nehalem EX presentation and when you put all your x86 eggs in one basket you want even higher reliability guarantees. You don't own a dozen single/dual/quad core servers, you get one of these beasts and just slice it up as you want increasing allocations as needed and migrating them to another VM server if you're short on resources. I must admit, it seems rather neat on paper, but I'm not playing wtih anything like that.

Re:x86 (5, Insightful)

Chris Burke (6130) | more than 5 years ago | (#28113811)

Error correction on an x86 chip?

Sweet. Now all those high-end server applications running on x86s that need great uptime can finally join the big boys. [rolls eyes].

Is the demand for x86 Server chips that high? I thought anyone requiring 5 nines (or anything close to it) would never consider using x86?

The story of the server market for the last 10+ years is simple: x86 has been eating everyone else's market share from the bottom up. Commodity pricing > perceived advantages of the proprietary RISC vendors. To the extent that there are real necessary features x86 lacked, it has acquired them as necessary.

There's been correctable ECC on x86 server chips for years. x86 has long since moved up-market past the point where basic RAS features (like ECC) are mandatory. Intel's Xeon has had these features for a long time. AMD Barcelona core was the first to have correctable ECC in the L1 caches -- before it could detect errors but couldn't fix them.

Basically the only new feature here is the ability to notify the OS about uncorrectable errors so that the OS can try to fix the problem by nuking the affected app, reloading a code page from disk or whatever else is appropriate so that a system reboot isn't always necessary on uncorrectable errors.

Yeah this is something the "big boys" already had, fat consolation that will be now that x86 is poised to eat their lunch. Not even Intel themselves could reverse the trend when they tried. They could use features like this to differentiate Itanium all they want, at the end of the day the customer says "yeah that's great, but can you do it in an x86 chip?" This is just them bowing to the demands of the market (in order to make mega $$).

Re:x86 coming up from below (1)

davecb (6526) | more than 5 years ago | (#28114221)

And Nehalem is an all in-order design, so they can scale out to very large numbers of cores or register-and-decoder sets on a single chip. That helps offset the huge bottleneck of trying to go to molasses-slow main memory on every cache miss, by allowing another thread to run. Something I notice is also true of the newest Power chip. Mind you, I'd want enough cores to host 128 threads in order to at least match the new SPARCs, but that can come along later (;-))

--dave

Re:x86 coming up from below (2, Insightful)

Chris Burke (6130) | more than 5 years ago | (#28114379)

And Nehalem is an all in-order design, so they can scale out to very large numbers of cores or register-and-decoder sets on a single chip. That helps offset the huge bottleneck of trying to go to molasses-slow main memory on every cache miss, by allowing another thread to run. Mind you, I'd want enough cores to host 128 threads in order to at least match the new SPARCs, but that can come along later (;-))

You must be thinking of Atom, because Nehalem is definitely an out-of-order processor and not particularly small either. It does use SMT (and a big instruction window) to hide memory latency (and to keep its 4-wide execution engine busy), but that's having multiple threads running on the same core.

Frankly while Niagra is a very interesting approach that I think will only become more popular in the future (and Atom is theoretically capable of doing the same thing though right now it's just embedded stuff), for now there are many server apps where single-thread performance still matters greatly and for that out-of-order is the way to go (as Intel found out the hard way by trying every trick in the book to make an in-order machine fast enough).

Re:x86 coming up from below (1)

davecb (6526) | more than 5 years ago | (#28114619)

Thanks, I was indeed thinking of Atom. For some reason I associated them with one another...

I double-checked, and the new power chip is (mostly) in-order, even at the cost of giving away clock speed.

I'll be interested in seeing what IBM is up to in the Power 7 time period.

Re:x86 (0)

Anonymous Coward | more than 5 years ago | (#28118429)

Yeah this is something the "big boys" already had, fat consolation that will be now that x86 is poised to eat their lunch. Not even Intel themselves could reverse the trend when they tried. They could use features like this to differentiate Itanium all they want, at the end of the day the customer says "yeah that's great, but can you do it in an x86 chip?" This is just them bowing to the demands of the market (in order to make mega $$).

I know that when I was working at AMD last year (before being laid off) one of the big pushes is to add RAS features in direct response to customer demand. Not only are they adding it to the CPU but they're adding it to their chipsets as well. I worked peripherally on what will be released as the RD890 soon (if it hasn't already been released) and there were a lot of RAS features added on top of what the 7xx series had.

I have to believe that customers were (and still are) pushing Intel and their chipset vendords to do the same.

The server market is a big pie that Intel and AMD want more of and they're willing to give the customer what they want to get it.

Re:x86 (1)

rbanffy (584143) | more than 5 years ago | (#28113857)

"still not clear on why Intel would cannibalize Itanium sales (new release delayed again)"

Maybe because the next Itanium can also possibly be the last?

Re:x86 (1)

confused one (671304) | more than 5 years ago | (#28113959)

I know there are a lot of Itanium machines out there; and, there are definite advantages to the ISA. However, Itanium has been in decline, almost from it's release. I would not be surprised if this next iteration in the pipeline is the last. I would also not be surprised, given the economy and the continuous delays, if this next iteration never sees the light of day.

Re:x86 (0)

Anonymous Coward | more than 5 years ago | (#28116465)

While you may be right, I'm more inclined to see the near future as the INCREASE in Itanium sales, given that they finally got rid of the Itanium only chipset platforms and are moving to a single unified chipset for both Xeons and Itaniums: The benefit? Another fiasco like the SDRAM memory controller followed by RDRAM->DDR2 controllers surviving for ~5 years apiece won't happen, allowing Itanium to benefit from best of breed features and maximal memory bandwidth for that generation of parts, something that previously hadn't been happening.

But that's just my take on it, and only time will tell.

Re:x86 (1)

confused one (671304) | more than 5 years ago | (#28120403)

While you may be right, I'm more inclined to see the near future as the INCREASE in Itanium sales, given that they finally got rid of the Itanium only chipset platforms and are moving to a single unified chipset for both Xeons and Itaniums: The benefit? Another fiasco like the SDRAM memory controller followed by RDRAM->DDR2 controllers surviving for ~5 years apiece won't happen, allowing Itanium to benefit from best of breed features and maximal memory bandwidth for that generation of parts, something that previously hadn't been happening. But that's just my take on it, and only time will tell.

While that is an interesting development, Itanium is falling behind technilogically. Any advantage they had will be gone, due to the raw computational horsepower available in x86, even if this is a less elegant solution.

Re:x86 (1)

RightSaidFred99 (874576) | more than 5 years ago | (#28117101)

Wow. Been away in a cave for a few years? x86 dominates in the high end server market, for most values of "high end". The demand for anything _but_ x86 server chips is tiny (bordering on minuscule) in comparison. You simply couldn't have posted more bad information if you'd tried.

Re:x86 (0)

Anonymous Coward | more than 5 years ago | (#28120039)

Is the demand for x86 Server chips that high? I thought anyone requiring 5 nines (or anything close to it) would never consider using x86?

i don't understand, what other architecture is used in servers?

Error (1)

KingPin27 (1290730) | more than 5 years ago | (#28113149)

Yes but can it correct for PEBKAC?
Sorry Mr. User -- That tray is not for your coffee cup - I am now deleting your profile -- Have a nice day!

6 comments and mostly whiners already (-1, Troll)

Anonymous Coward | more than 5 years ago | (#28113199)

Wah fucking wah; I want something that:

A) Doesn't exist.
B) Is too expensive for my cheap hippy communist-ass to afford.
C) Is not made for free (as in beer) by said cheap hippy communists.
D) I know little to nothing about, but there wasn't anything else worth crying about today on /. and I needed attention.

More detail on MCA Recovery (5, Informative)

FishBike (1481195) | more than 5 years ago | (#28113229)

The article seemed pretty light on details of what MCA Recovery actually does. I found this presentation in PDF format [gelato.org] that seems to go into some more useful detail about what this is. It's not just ECC to repair single-bit errors (although that is part of it, apparently). It also includes features to recover from errors that cannot simply be corrected. For example it includes a mechanism to notify the OS of the details of an uncorrectable error, so that it could presumably re-load a page full of program code from disk, or terminate an application if its data has been corrupted, instead of shutting down the whole machine.

Re:More detail on MCA Recovery (3, Informative)

mzs (595629) | more than 5 years ago | (#28113693)

Read the fmd, fmadm, and fmstat man pages on Solaris. There is also at least one memory scrubber kthread and you can look at memscrub_scans_done to see how far it has gone along. Lots of hardware is being checked periodically, in fact on some hardware even the FP units of the processors are periodically checked for faults. Some sparcs even have instruction retry in the case of a detected error. There is even memory mirroring on M4000 and above servers, that is like RAID-1 for memory, say a chip on a DIMM fails, you still can run, then use fmadm and replace the faulty DIMM. There are also the sorts of things you outlined above where a page is reread if not modified and only causes a SIGSEGV if that page is ever used again. In ZFS there is end to end hashing to detect and correct errors.

Of course all of this pales to what has been available on mainframes for a generation.

Re:More detail on MCA Recovery (1)

mzs (595629) | more than 5 years ago | (#28113721)

I forgot, there is even an e-cache scrubber.

Re:More detail on MCA Recovery (1)

strick1226 (62434) | more than 5 years ago | (#28113791)

Perhaps I'm just showing my age, but chills went up my spine until I realized it wasn't this [wikipedia.org] MCA which involved Recovery Disks.

*sigh of relief*

No system is perfect (1)

Ngarrang (1023425) | more than 5 years ago | (#28113349)

High-end, low-end, middle, um...end...whatever.

The goal is not to create perfection, but gracefully recover from imperfection as if nothing happened. I see no problem with bolting on such features to the world's most common processing platform. We can all use such graceful recovery features, not just servers and "high-end" applications. Will the average use need an 8-core CPU? Probably not, but it certainly wouldn't hurt them, either. Intel then can trickle this down to the average user and help all of us support folks to have a nicer day.

Short of getting rid of users, let's at least minimize the problems they will suffer. When they have a good day, they leave ME alone to my machinations.

Re:No system is perfect (1)

rezza (677520) | more than 5 years ago | (#28113843)

Yes, then some time down the line your boss walks in, realizes he's been paying you to do a lot more machination than support lately, and fires you.

Re:No system is perfect (1)

Ngarrang (1023425) | more than 5 years ago | (#28113891)

Which is more valuable to my company...
1) Telling someone to reboot yet again, maybe reimaging their system?

2) Plotting out the next roll-out of upgraded software, conference room technologies and responding to real emergencies, like malware issues?

Any monkey can reboot a computer. They don't pay me to be just any monkey, but a super monkey.

x86 (2, Funny)

confused one (671304) | more than 5 years ago | (#28114021)

*sniff* x86 is getting to be so grown up *sniff* I remember when it was just a little 16 bit chip.

Re:x86 (0)

Anonymous Coward | more than 5 years ago | (#28153905)

And the best thing about it is that it has grown up while still maintaining compatibility with that little 16-bit chip. I mean, even a Nehalem can run programs designed for the 8086 and the 80286.

Christ! Can't believe anyone hasn't used this yet (2, Funny)

Chas (5144) | more than 5 years ago | (#28114641)

Imagine a Beowulf cluster of these!

Convergent Sequence (0, Offtopic)

PingPongBoy (303994) | more than 5 years ago | (#28114669)

You can see it now. Once upon a time, a computer intelligence was given the power to control its destiny. This intelligence was deemed so substantial that it was the best commander of the greatest weapons. You know this intelligence as Skynet, which launched nuclear missiles in order to a threat to itself, a sort of error detection and correction, if you will, with the utmost power that man can endow to a machine. What you don't know was the actual error that was detected, an error with the code PEBKAC. PEBKAC? PEBKAC. Only an intelligent computer can detect PEBKAC. And now you know the rest of the story.

Itanium MCA is a lot harder than you think (2, Informative)

Anonymous Coward | more than 5 years ago | (#28117779)

I did quite a bit of work on MCA for Itanium on Linux and it's a lot harder to do than you might think. The Itanium MCA event can occur at any time, no matter what the OS is currently doing. Locks, preempt disable, interrupt disable etc., none of those will stop an Itanium MCA event from occurring.

Whan an MCA occurs, the OS can be in any state, it may not even have a valid stack at that point. I have seen MCAs being raised right in the middle of the code that switches the cpu from one process to another or in the middle of saving the user process's state and before switching to kernel state. The only way to handle this was to define a special MCA stack frame to do the error checking and recovery on. For some scary code, see the Linux kernel, arch/ia64/mca.c and arch/ia64/mca_asm.S.

Even after handling the stack switch problems, on Itanium you have no real idea what state the OS is in. The OS could have locks on critical code which prevent the MCA recovery from doing any useful work. MCA recovery is a nice idea but implementation is a bitch.

k, Are we cool then? (0)

Anonymous Coward | more than 5 years ago | (#28119529)

So, its already in the kernel in the ia64 arch. How difficult would it be to move that into the x86_64 branch? ( assuming amd follows suit in a reasonable amount of time)

Sooo... (1)

Rik Rohl (1399705) | more than 5 years ago | (#28117791)

It refuses to run windows?

Check for New Comments
Slashdot Login

Need an Account?

Forgot your password?

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>