Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Researchers Unveil Experimental 36-Core Chip

samzenpus posted about 4 months ago | from the we-need-another-core dept.

Hardware 143

rtoz writes The more cores — or processing units — a computer chip has, the bigger the problem of communication between cores becomes. For years, Li-Shiuan Peh, the Singapore Research Professor of Electrical Engineering and Computer Science at MIT, has argued that the massively multicore chips of the future will need to resemble little Internets, where each core has an associated router, and data travels between cores in packets of fixed size. This week, at the International Symposium on Computer Architecture, Peh's group unveiled a 36-core chip that features just such a "network-on-chip." In addition to implementing many of the group's earlier ideas, it also solves one of the problems that has bedeviled previous attempts to design networks-on-chip: maintaining cache coherence, or ensuring that cores' locally stored copies of globally accessible data remain up to date.

cancel ×

143 comments

Sorry! There are no comments related to the filter you selected.

Different Power Supply Voltage (-1)

rtoz (2530056) | about 4 months ago | (#47297211)

According to the comparison table, (Refer timeline 4:21 of this video [youtube.com] ) this chip uses 1.1V while other standard chips are using 1.0V. This difference may make it hard for the chip makers to use this technology.

Re:Different Power Supply Voltage (1, Insightful)

drinkypoo (153816) | about 4 months ago | (#47297243)

According to the comparison table, (Refer timeline 4:21 of this video) this chip uses 1.1V while other standard chips are using 1.0V. This difference may make it hard for the chip makers to use this technology.

Really? They won't be able to specify a 1.1V VRM instead of a 1.0V VRM? Those poor, poor chip makers. They sound like a bunch of incompetent fucks.

Re:Different Power Supply Voltage (4, Interesting)

fuzzyfuzzyfungus (1223518) | about 4 months ago | (#47297351)

A higher high/low voltage swing (with a reasonable amount of other stuff being equal) will be more of a thermal nuisance; but if the perks make up for it, that's hardly a dealbreaker. The toasty end of boring desktop CPUs is somewhere north of 200watts already, with a little shoving that they typically survive, so if somebody really wants 36 cache-coherent cores on-die, they'll suck it up and make it work.

For applications that don't specifically demand that, I'd be interested to know how the costs and benefits of 'dealing with the cooling demands of a smaller number of denser parts' compare with 'dealing with the cooling demands of more, cooler, parts, closer to whatever the performance per watt sweet spot is; but with more cabling, PSUs, switches, and similar interconnect and support stuff to buy and power'...

Re:Different Power Supply Voltage (1)

wagnerrp (1305589) | about 4 months ago | (#47299037)

The toasty end of boring desktop CPUs is somewhere north of 200watts already

Well... somewhere south of 100W, anyway, and even high end workstation/server chips are under 150W.

Re:Different Power Supply Voltage (0)

Anonymous Coward | about 4 months ago | (#47297775)

1V over 1um is a million volts per meter, assnozzle. Your incompetent and imbecilic commentary really drag down this place.

Re:Different Power Supply Voltage (0)

Anonymous Coward | about 4 months ago | (#47298297)

I figure you must be some type of alcoholic or have another substance problem. Can you confirm?

Re:Different Power Supply Voltage (1)

drinkypoo (153816) | about 4 months ago | (#47299201)

I figure you must be some type of alcoholic or have another substance problem. Can you confirm?

Yeah, I'm allergic to stupidity. I have to take a pill before I can come anywhere near slashdot, and keep an inhaler and epi pen on hand.

Re:Different Power Supply Voltage (2)

edmudama (155475) | about 4 months ago | (#47297251)

That doesn't matter. The power supply surrounding the socket/pads will account for whatever Vcc needs to be.

Re:Different Power Supply Voltage (-1)

Anonymous Coward | about 4 months ago | (#47297825)

But not inside the chip. You people seem to forget we're dealing with chips that have features counted in individual atoms. 1V across three atoms may work, 1.1V across three atoms arcs over.

Really, you people need to stop playing with software. Using software considered harmful to understanding reality.

Re:Different Power Supply Voltage (4, Interesting)

Moof123 (1292134) | about 4 months ago | (#47298639)

Banging my head on the table right now.

Why do people with zero actual semiconductor knowledge try to speak as an authority*?!

It's a research chip, meaning they don't need to be on the latest process node to show their proof of concept. Larger nodes (much cheaper to design a chip on) have thicker gate passivation layers and run at higher voltage. From an architecture standpoint the process node/voltage are irrelevant. So if their architecture proves out, some bigger outfit can run with it while targetting the latest-greatest itty-bitty process node to increase the clock-rate, drop the power, and reduce the area/price.

*I am not a processor designer, just a mixed signal (mostly analog) guy, but I've been working in the semiconductor industry, including doing process bake-offs for over a dozen years.

Re:Different Power Supply Voltage (1)

Ralph Wiggam (22354) | about 4 months ago | (#47298891)

Why do people with zero actual semiconductor knowledge try to speak as an authority*?

Is this your first day on Slashdot?

Re:Different Power Supply Voltage (1)

wagnerrp (1305589) | about 4 months ago | (#47299093)

You people seem to forget we're dealing with chips that have features counted in individual atoms. 1V across three atoms may work, 1.1V across three atoms arcs over.

Luckily we're still dealing with features hundreds of atoms across, and not just three...

Re:Different Power Supply Voltage (0)

Anonymous Coward | about 4 months ago | (#47299167)

Oh look, it's Mr Heat Controls Transistors, who still hasn't provided a single source for his heat theory.

But anyways

https://www.youtube.com/watch?... [youtube.com]

Three atoms, dickweed.

Re:Different Power Supply Voltage (1)

LynnwoodRooster (966895) | about 4 months ago | (#47298431)

According to the comparison table, (Refer timeline 4:21 of this video [youtube.com] ) this chip uses 1.1V while other standard chips are using 1.0V. This difference may make it hard for the chip makers to use this technology.

No, it's the only way to make it faster because it goes to eleven...

im still a bit skeptical. (3, Funny)

nimbius (983462) | about 4 months ago | (#47297223)

All this performance in just one chip. I mean, sure, it has 36 cores but lets be rational here...does it seriously expect to run crysis?

Re:im still a bit skeptical. (1, Funny)

Anonymous Coward | about 4 months ago | (#47297281)

You'd need to imagine a beowulf cluster of 'em to accomplish that.

Re:im still a bit skeptical. (0)

Anonymous Coward | about 4 months ago | (#47297497)

Imagine a beowulf cluster of them on a chip and we're getting somewhere.

Re:im still a bit skeptical. (0)

Anonymous Coward | about 4 months ago | (#47297789)

...does it seriously expect to run crysis?

I didn't seriously expect that joke was still funny. But there you go.

Moore's Law (0)

Grindalf (1089511) | about 4 months ago | (#47297267)

That's a fun post! 36-core is immense! As an aside: It's been a while since we've seen any decent rise in processor Ghz. I remember IBM talking about functioning reasonably cool 10 Ghz processors (ref needed) in the early 2000s, but no one has them in the shops yet! I'm sure this was discussed in Moore's Law lectures prior to Y2K, but mention it these days and everyone scowls! So some people can (and they run cool) and some people can't, what normally happens in computing when the faster items are released?

Re:Moore's Law (4, Interesting)

Opportunist (166417) | about 4 months ago | (#47297285)

As an aside: It's been a while since we've seen any decent rise in processor Ghz.

Just to abuse a car analogy: Maybe it's time we stop revving up and instead shift gears.

Re:Moore's Law (2)

Grindalf (1089511) | about 4 months ago | (#47297317)

Whilst I have my foot to the floor ... I still think it's a failure of science - there's nothing wrong with doing both simultaneously - to believe otherwise would be to buy into a rhetorical device based on "false opposites."

Re:Moore's Law (1)

dreamchaser (49529) | about 4 months ago | (#47297417)

There are still technical challenges to increasing clock speed. Just because "IBM said it would" doesn't make it so. Instead you are seeing higher IPC due to architectural refinements as well as more and more cores. Clock speeds are still inching up but do not expect any huge radical jumps anytime soon.

Re:Moore's Law (1)

Opportunist (166417) | about 4 months ago | (#47297799)

Of course, but just like when you shift, first shift, then rev the engine up again. Else your clutch will probably wear out quickly.

And no, I have absolutely no idea how that analogy still applies.

Re:Moore's Law (0)

Anonymous Coward | about 4 months ago | (#47297915)

Yes of course, everything is just a continuous cycle of ever-improving performance, which is why we still fly the Concorde and work only 20 hours a week in a leisure society. Oops.

Re:Moore's Law (1)

Virtucon (127420) | about 4 months ago | (#47297431)

Nope, Liquid Nitrogen cooling gets you past the speed limits. How about over 8Ghz [youtube.com] on a chip that costs less than $200? Going to Helium and you can get over 8.5Ghz. [youtube.com] although both become a bit unweildy when it comes to game play because I don't want my hard drives to freeze. I love that last video there's some real country boy engineering there including using a propane torch and a hair dryer to keep certain components from freezing.

Re:Moore's Law (3, Insightful)

Shoten (260439) | about 4 months ago | (#47297717)

Nope, Liquid Nitrogen cooling gets you past the speed limits. How about over 8Ghz [youtube.com] on a chip that costs less than $200? Going to Helium and you can get over 8.5Ghz. [youtube.com] although both become a bit unweildy when it comes to game play because I don't want my hard drives to freeze. I love that last video there's some real country boy engineering there including using a propane torch and a hair dryer to keep certain components from freezing.

I'm a little confused as to why you're citing the chip's low low price of "less than $200" if you need liquid nitrogen to get it to perform the way you want it to. You do realize that cooling systems cost money, too...right? There's no point in being able to use a cheap processor to get to X performance benchmark if the required additional support systems cost thousands of dollars more than a more powerful and more expensive processor that can do it out of the box. Not to mention the fact that liquid nitrogen cooling isn't exactly hassle-free, especially in a household environment. And it's worth noting that even if you boost Ghz, you eventually run into choke points related to pushing data to and from the chip anyways. You can give the most important worker on an assembly line all the crystal meth they can eat, but they can't work any faster than the conveyor belt in front of them.

Re:Moore's Law (4, Funny)

ColdWetDog (752185) | about 4 months ago | (#47298945)

You can give the most important worker on an assembly line all the crystal meth they can eat, but they can't work any faster than the conveyor belt in front of them.

Ah! The 21st Century version of the 'mythical man month' - so much more apropos for this audience than the pregnancy analogy.

Re:Moore's Law (1)

skovnymfe (1671822) | about 4 months ago | (#47297731)

Liquid Nitrogen/Helium cooling is great... while it lasts. When it's used up however, you've got to pay for another bottle of cooling. I have no idea how long a $200 CPU can run at 8GHz on a bottle of Nitrogen, or how much a bottle of Nitrogen costs, but I can't imagine it's a good long term solution.

Re:Moore's Law (2)

SuricouRaven (1897204) | about 4 months ago | (#47298419)

Nitrogen overclocking is done for contests. You can get phase change cooling, which is the next best thing and will still get your processor far below zero. The big downside to that is just power consumption. It's also bulky and noisy.

Re:Moore's Law (3, Interesting)

Anonymous Coward | about 4 months ago | (#47297465)

A better analogy is that they keep adding seats and making the whole vehicle slower.

Kawasaki Ninja == 10GHZ single core (fastest way to get anywhere alone)
Ford Mustang == 4GHz quad-core (most people only use the front two seats, but if desperate you can squeeze more people in)
Chevy Suburban == 3.3 GHz 8-core (it seems like everyone wants one, but most people who have a full load just have a bunch of little kiddies)
Mercedes Sprinter == 2.7 GHz 12-core (just meant to be a grinding people hauler)
School Bus == 1.2GHz Xeon Phi (slow as hell and very specialized, no normal person would ever want one)
Double Decker Bus == Peh's stuff (probably a use for mass transit(i.e virtualization) and as a cool novelty)

Re:Moore's Law (1)

Z00L00K (682162) | about 4 months ago | (#47297703)

Boeing 747 == take on a crapload of people for a long haul excursion.

Re:Moore's Law (1)

jones_supa (887896) | about 4 months ago | (#47298537)

Let's still not forget that a single core of a modern Core i7 chip is about 6x as fast as a single-core Pentium 4. At the same clockspeed.

Re:Moore's Law (1)

default luser (529332) | about 4 months ago | (#47298799)

Absolutely not true.

The Core 2 Duo is approximately 2x faster clock-for-clock versus the Pentium 4 [techreport.com] , and the current Haswell core is barely 40% faster than that (assume a 7% speedup per-clock for every core rev since). That gets you somewhere in the 2x-3x performance improvement range for Haswell, barring corner-cases that are embarrassingly easy to leverage AVX/FMA (most real-world use cases show small improvements).

Intel proved that they could do a whole lot better than the Pentium 4, but your performance improvement factor is off by half!

Re:Moore's Law (1)

Kaenneth (82978) | about 4 months ago | (#47299115)

The Titanic == Itanium

Re:Moore's Law (1)

Salgat (1098063) | about 4 months ago | (#47298427)

What an empty statement. It's easy to say we should try something else when things get difficult, without having any practical solution in place.

Re:Moore's Law (1)

ColdWetDog (752185) | about 4 months ago | (#47298915)

What's a 'gear'?

Re:Moore's Law (1)

K. S. Kyosuke (729550) | about 4 months ago | (#47297287)

36-core is immense!

Yawn... [greenarraychips.com]

Re:Moore's Law (1)

Grindalf (1089511) | about 4 months ago | (#47297331)

That's a beauty, and the Ezekiel ref in your sig (23:20) made me laugh out loud too ...

Re:Moore's Law (1)

Opportunist (166417) | about 4 months ago | (#47297817)

Odd. I went to a catholic church, but strangely we never got to that part.

Talk about selective teaching and leaving out the interesting parts!

Re:Moore's Law (1)

itzly (3699663) | about 4 months ago | (#47297383)

Try doing a 100x100 double precision matrix inversion on one of those chips, and you'll stop yawning pretty quickly.

Re:Moore's Law (1)

K. S. Kyosuke (729550) | about 4 months ago | (#47297437)

Now why would I want to do that? Obviously, for FP tasks, a modified design would be necessary - but given that GA144 is already unbeatable in integer performace energy efficiency, even at 180 nm where it's being manufactured, if you extend the ISA to be more FP-friendly and switch to a recent process, I don't see a problem. Well, it would need different memory interfaces to make it a shared memory multiprocessor. That's a bummer. But I guess it can't be helped; programmers are lazy.

Re:Moore's Law (1)

itzly (3699663) | about 4 months ago | (#47297473)

Maybe you don't want to do that, but good floating point performance is a requirement for a lot of useful tasks. Also, many real world tasks need access to large amounts of memory, and often that memory needs to be available to multiple nodes. The GA144 fails there too, since it has a pitiful amount of memory. Except for a small handful of niche applications that happen to match the GA144's capabilities, it's a useless device.

Re:Moore's Law (1)

K. S. Kyosuke (729550) | about 4 months ago | (#47297513)

It's the notion (asynchronous, self-clocked, energy efficient chip, maximizing performance per watt and performance per mm^2) that matters to me, not this specific design (which is intended for specific purposes). Witness how the HPC people embraced GPUs, which are sort of heading in a similar direction already.

Re:Moore's Law (1)

gnasher719 (869701) | about 4 months ago | (#47298083)

Try doing a 100x100 double precision matrix inversion on one of those chips, and you'll stop yawning pretty quickly.

That should be easily done in a millisecond or so on a single core of any modern Intel processor. You could probably get it down to 100 microseconds on the latest ones.

Re:Moore's Law (2)

itzly (3699663) | about 4 months ago | (#47298597)

My point exactly. What is a simple task on an modern Intel becomes nearly impossible on the GA144. We've already tried the idea of combining large numbers of simple processors, and it has failed every single time. If NxM simple cores together can't beat a modern Intel processor for a range of useful tasks, there's not much point in developing it.

Re:Moore's Law (0)

Anonymous Coward | about 4 months ago | (#47298091)

A litte more yawn:
http://en.wikipedia.org/wiki/PicoChip
and of course Adapteva's Epiphany IV mentioned in another thread.

Apart from that there are others that have tried to achieve better performance with lots of simple CPU cores/threads:
http://en.wikipedia.org/wiki/Parallax_Propeller
http://en.wikipedia.org/wiki/XMOS
http://en.wikipedia.org/wiki/Ubicom (skip to Ubicom32)

Re:Moore's Law (1)

Anonymous Coward | about 4 months ago | (#47297309)

Think of the possibilites:

make -j36

Re:Moore's Law (1)

wjcofkc (964165) | about 4 months ago | (#47297337)

The reason Apple stuck with the Power architecture for so long, was because IBM promised them quad and greater core chips running at 8 Ghz, air cooled, by 2005. Needless to say, they didn't even come close to delivering. It was that failure that led Apple to switch to x86.

Re:Moore's Law (1)

Grindalf (1089511) | about 4 months ago | (#47297427)

Regarding your Power PC comment specifically: This was from an IBM research department that makes a processor other than the Power PC – I believe they were used for Z OS units or the like. I use it as an operational example ONLY as, as Mr Spock says:- “that which HAS happened CAN happen” and it is therefore a possibility.

Re:Moore's Law (1)

caseih (160668) | about 4 months ago | (#47297389)

And hopefully in any lectures on Moore's Law, the students learn that Moore's Law refers to transistors on a die, not the speed of the chips. This 36-core chip probably jumps ahead of Moore's Law a bit, as it's got to be a fairly large die. In any event Moore's Law continues to hold, more or less. Other things like CPU speed have followed a similar trend in times past, but no longer do now.

Re:Moore's Law (2)

willy_me (212994) | about 4 months ago | (#47297499)

And hopefully in any lectures on Moore's Law, the students learn that Moore's Law refers to transistors on a die, not the speed of the chips. This 36-core chip probably jumps ahead of Moore's Law a bit, as it's got to be a fairly large die.

Moore's Law refers to the number of components per integrated circuit for minimum cost. Note that this is basically transistor density and is not impacted by core size. Silicon defects and transistor size determine the optimal number of components per IC.

A quote from Wikipedia,

Moore himself wrote only about the density of components, "a component being a transistor, resistor, diode or capacitor,"[26] at minimum cost.

Re:Moore's Law (2)

Virtucon (127420) | about 4 months ago | (#47297405)

Immense? Immense you say? Try IBM's mega footprint z196 [wikipedia.org] at over 512mm^2 is one big ass chip.

Re:Moore's Law (1)

Grindalf (1089511) | about 4 months ago | (#47297597)

Do you know what I'm going to do? I'm going to go out and get a shirt printed up with the expression “I Heart Processor Ghz” and wear it at parties! Why? Every time this crops up at meetings that I've attended there is always someone who loses their temper at the mere mention of anyone developing a faster processor, irrespective of how many cores or cache size and I don't like it! The physics has been done! I'd swear, if anyone ever does a THz processor, and one of these kids finds out, they'll egotistically self explode on the spot and try and burn down the factory in question because it's such an affront. The cultural phenomenon of “Gigaphobia” needs investigating by qualified professionals :0)

Re:Moore's Law (1)

Z00L00K (682162) | about 4 months ago | (#47297693)

It's of course good if the distance between cores is kept to a minimum, but if the software designers and compilers considers the limitations when generating the binaries it may not be a huge performance bottleneck in real world applications.

It's better to switch to a new core than to switch task on a core for example. Looking at what happens in a modern PC most processing is mostly unrelated to the other. Even inside a web browser you may have several plugins running in different parts of the screen, but they don't really interact with each other, so they can run on standalone processor cores.

When doing SIMD calculations then you run the same instruction in parallel on many cores with different data as input, and that is not a big deal either.

The bottleneck you may experience is on the buses to RAM, disk and I/O devices. Just realize that not every core has the same distance to the resource - so by having affinity on the executables indicating preferences to type of I/O it might be possible to assign it to the correct area of cores in the processor.

So far much of the computer design has been into trying to make the computer as general as possible. Think of it as a Swiss Army knife (Or maybe a Leatherman Multi-tool) - it can do everything, but not excel at anything. A real mechanic has a good toolbox instead with different groups of tools, screwdrivers, hammers etc. and each of those tools are highly specialized even within it's group. Screwdrivers comes in many forms; Flat, Phillips, Pozidrive, Allen, Torx, XZN, Orange Juice based etc. By using the right tool for the job you get the work done faster and often more accurate than by the generic tool.

Re:Moore's Law (1)

Rockoon (1252108) | about 4 months ago | (#47297907)

When doing SIMD calculations then you run the same instruction in parallel on many cores with different data as input

Your definition of core seems to be completely different from anyone elses. You seem to have relabeled execution units as 'cores,' and for seemingly completely ignorant reasons.

36 cores? Network on a chip? Meh! (1)

Kyle (4392) | about 4 months ago | (#47297277)

http://www.adapteva.com/epipha... [adapteva.com]
64 cores, mesh network that extends off the chip, in production.

Try harder MIT :-p

Re:36 cores? Network on a chip? Meh! (1)

itzly (3699663) | about 4 months ago | (#47297301)

Adding cores is easy. Keeping all the cores busy with useful work in a typical range of high performance applications is the difficult part.

Re:36 cores? Network on a chip? Meh! (1)

Melkhior (169823) | about 4 months ago | (#47297343)

<quote><p> <a href="http://www.adapteva.com/epiphanyiv/">http://www.adapteva.com/epipha...</a>
64 cores, mesh network that extends off the chip, in production.</p><p>Try harder MIT :-p</p></quote>

They already tried harder : http://www.tilera.com/. And as another post mentioned, Intel Knights Corner is cache coherent on 61 cores (62 architectured).

The summary doesn't get the point of the article: what's novel is not the presence of cache coherency, it's just the new way of implementing snoop-based cache coherency over their network. Cache coherency for a large number of cores can be very expensive time-wise, so any idea to improve it is more than wecome.

Re:36 cores? Network on a chip? Meh! (0)

Anonymous Coward | about 4 months ago | (#47297349)

Is that one actually made? The only one in the store is the 16 core version.

Re:36 cores? Network on a chip? Meh! (0)

Anonymous Coward | about 4 months ago | (#47297367)

And no cache is discussed anywhere (even in the reference manual).

Re: 36 cores? Network on a chip? Meh! (0)

Anonymous Coward | about 4 months ago | (#47297527)

Made, and about to be unleashed
http://www.hpcwire.com/off-the-wire/adapteva-unveils-worlds-smallest-supercomputing-platform-isc14/

Re:36 cores? Network on a chip? Meh! (5, Informative)

TheRaven64 (641858) | about 4 months ago | (#47298847)

The core count isn't the interesting thing about this chip. The cores themselves are pretty boring off-the-shelf parts too. I was at the ISCA presentation about this last week and it's actually pretty interesting. I'd recommend reading the paper (linked to from the press release) rather than the press release, because the press release is up to MIT's press department's usual standards (i.e. completely content-free and focussing on totally the wrong thing). The cool stuff is in the interconnect, which uses the bounded latency of the longest path multiplied by single-cycle one-hop delivery times to define an ordering, allowing you to implement a sequentially consistent view of memory relatively cheaply.

Since I'm here, I'll also throw out a plug for the work we presented at ISCA, The CHERI capability model: Revisiting RISC in an age of risk [cam.ac.uk] . We've now open sourced (as a code dump, public VCS coming soon) our (64-bit) MIPS softcore, which is the basis for the experimentation in CHERI. It boots FreeBSD and there are a few sitting around the place that we can ssh into and run. This is pretty nice for experimentation, because it takes about 2 hours to produce and boot a new revision of the CPU.

Intel Knights Landing (2, Informative)

SirDrinksAlot (226001) | about 4 months ago | (#47297289)

So what's special about this chip that Intel's Xeon Phi (first demonstrated in 2007 as Knights Landing with 80 or so cores) isn't already doing? Or is this just a rehash of 7 year old technology that's already in production? It sounds like a copy/paste of Intel's research.

"Intel's research chip has 80 cores, or "tiles," Rattner said. Each tile has a computing element and a router, allowing it to crunch data individually and transport that data to neighboring tiles." - Feb 11, 2007

Re:Intel Knights Landing (2)

dreamchaser (49529) | about 4 months ago | (#47297325)

Presumably the novel way they address (pun intended) cache coherency is what is new. More efficiency = greater performance. Time will tell.

Re:Intel Knights Landing (5, Informative)

Trepidity (597) | about 4 months ago | (#47297359)

Yes, as usual, the MIT press release oversells the research, while the original paper [pdf] [mit.edu] is a bit more careful in its claims. The paper makes clear that the novel contribution isn't the idea of putting "little internets" (as the press release calls them) on a chip, but acknowledges that there is already a lot of research in the area of on-chip routing between cores. The paper's contribution is to propose a new cache coherence scheme which they claim has scalability advantages over existing schemes.

Re:Intel Knights Landing (1)

epine (68316) | about 4 months ago | (#47297891)

The paper's contribution is to propose a new cache coherence scheme which they claim has scalability advantages over existing schemes.

Somehow this was obvious to me even from the press release. I've never yet seen details of an ordering model laid bare where it wasn't the core novelty. Ordering models are inherently substantive. Ordering models beget theorems. Cute little Internets drool and coo.

Re:Intel Knights Landing (1)

gman003 (1693318) | about 4 months ago | (#47297335)

It does seem rather similar - a large cluster of cores, laid out in a grid topology. Perhaps they're doing something different with the cache coherency? I couldn't find too much on how Intel's handling that, and it seems to be a focus of the articles on this chip.

Intel Knights Landing (0)

Anonymous Coward | about 4 months ago | (#47297471)

So the point here, which is not insignificant if they really have solved it, is cache coherency. Snooping other caches can become a massively expensive task in terms of round trip latency. Potentially 1000s of cylces. In low power architectures (which is certainly a consideration) moving data around is really expensive from a power perspective and is where a large portion of the power is actually spent in real world uses. IF they have solved it in a more efficient manner rather than the current brute-force approaches, then that's good research.

Architecture (1)

wjcofkc (964165) | about 4 months ago | (#47297313)

I would be curious to know more about the architecture and all around chip specs they are using in their prototype: clock speed, memory interface, etc. The article states they are developing a version of Linux to test it on, so it's safe to say it's an established architecture. Anyway, I am excited to see the results once they have tested it on Linux. While this does not help with the density per core problem, perhaps it will help extend Moore's Law from the perspective of speed increase in respect to micro circuitry.

D-WAV-E !! (-1)

Anonymous Coward | about 4 months ago | (#47297315)

I want my. I want my. I want my D-WAV-E !! And money for nothin and chicks for free !!

Re:D-WAV-E !! (0)

Anonymous Coward | about 4 months ago | (#47298099)

If it was up to me, you could have ours. The refrigerators in the parking garage are really annoying.

Is there anything new here? (1)

Junta (36770) | about 4 months ago | (#47297345)

So, in one die, it's a little interesting, though GPU stream processors and Intel's Phi would seem to suggest this is not that novel. The latter even let's you ssh in and see the core count for yourself in a very familiar way (though it's not exactly the easiest of devices to manage, it's still a very much real world example of how this isn't new to the world).

The 'not all cores are connected' is even older. In the commodity space, hypertransport and QPI can be used to construct topologies that are not full mesh. So not only is it not all cores on a bus, it is also not all cores mesh connected, the two attributes claimed as novel here.

Basically, as of AMD64 people had relatively affordable access to an implementation of the concept, and as of Nehalm both major x86 vendors had this concept in place. Each die included all the logic needed to implement a fabric, with the board providing essentially passive traces.

Re:Is there anything new here? (0)

Anonymous Coward | about 4 months ago | (#47297391)

Also checkout "transputers" for waferscale networks of processor cores.

http://en.m.wikipedia.org/wiki/Transputer

Re:Is there anything new here? (4, Informative)

Trepidity (597) | about 4 months ago | (#47297429)

The basic idea isn't new. What the paper is really claiming is new is their particular cache coherence scheme, which (to quote from the Conclusion) "supports global ordering of requests on a mesh network by decoupling the message delivery from the ordering", making it "able to address key coherence scalability concerns".

How novel and useful that is I don't know, because it's really a more specialist contribution than the headline claims, to be evaluated by people who are experts in multicore cache coherence schemes.

Passing messages rather than sharing! (0)

Anonymous Coward | about 4 months ago | (#47297347)

Erlang on a chip :-)

Where's my massively parrallel programming languag (1)

KingOfBLASH (620432) | about 4 months ago | (#47297363)

While adding an extra core or two made big jumps in performance (because you are almost always running at least two applications) there comes a point where most users won't see a performance boost. While I may now be able to throw 36 processors at a problem, you have to program all those cores to work together. Right now that's a lot of effort, and until programming languages catch up and can optimize code by making it massively parallel, this is going to be a non-starter.

Re:Where's my massively parrallel programming lang (1)

itzly (3699663) | about 4 months ago | (#47297397)

A "new programming language" isn't a magical solution to make a non-parallel algorithm work well on a multi processor architecture.

Re:Where's my massively parrallel programming lang (0)

Anonymous Coward | about 4 months ago | (#47297487)

No kidding... If that was all there was to it the guys at the CPU level would just do it for us.

The problem is previous results dependency. If you do not care about previous results then multi threaded programming is dead easy and a scheduling problem which is fairly well understood. It is when you need the previous results or external I/O that parallel fails.

I predict that at some point some bright spark of a CPU guy will come up with the idea of discarded results. As 'if' conditions tend to create pipeline stalls. You could go ahead a run both paths of code. Then decide which one is correct and discard the unused results. Maybe the already have... I have not followed CPU acrh for a while now...

Re:Where's my massively parrallel programming lang (2)

Rockoon (1252108) | about 4 months ago | (#47298043)

You could go ahead a run both paths of code. Then decide which one is correct and discard the unused results.

Intel is already doing partial speculative execution in the case of conditional branches. The pipeline is filled with the predicted path which is then frequently executed out of order (before the condition is known) ..

Intel is not however doing the full concept you have described (eager speculative execution) and I don't think its likely that they ever will. The best case for eager speculative execution would be when the branches are completely unpredictable, which is only very rarely true. Further, it requites significant over-provisioning of execution units to have enough to execute both paths of a conditional branch each at "best possible speed" .. resources that would be completely wasted whenever there isnt a conditional branch in the pipeline...

Re:Where's my massively parrallel programming lang (1)

Z00L00K (682162) | about 4 months ago | (#47297759)

The question is - do you always need a parallel tasking software? Most tasks are bread&butter tasks, no need to chew them up. Put your energy into the few things that do need to be broken up.

But mostly it's a "hen and egg" problem - can't do multi-core software since there aren't enough serious multi-core machines, or the owners in software companies don't see a benefit in it.

The two hardest problems in CS: (4, Funny)

magsol (1406749) | about 4 months ago | (#47297373)

pointer arithmetic, cache invalidation, and off-by-one errors

Re:The two hardest problems in CS: (0)

Anonymous Coward | about 4 months ago | (#47298375)

and naming

Re:The two hardest problems in CS: (0)

frank_adrian314159 (469671) | about 4 months ago | (#47298435)

No one expects the Spanish Inquisition.

Interesting (3, Informative)

Virtucon (127420) | about 4 months ago | (#47297395)

Cache coherency has been one of the banes of multicore architecture for years. It's nice to see a different approach but chip manufacturers are already getting high performance results without introducing additional complexity. The Oracle (Sun) Sparc T5 [oracle.com] architecture has 16 cores with 128 threads running at 3.6Ghz. It gives a few more years to Solaris at least but it's still a hell of a processor. For you Intel fans the E7-2790 v2 [intel.com] sports 15 cores with 30 threads with a 37.5MB cache so they're doing something right because it screams and is capable of 85GB/s memory throughput.

I'm sure the chip architects are looking at this research but somehow I think they're already ahead of the curve because these kinds of cores/threads are jumps ahead of where we were just a few years ago. Anybody remember the first Pentium Dual Core [wikipedia.org] and The UltraSparc T1 [wikipedia.org] ?

Re:Interesting (0)

Anonymous Coward | about 4 months ago | (#47297635)

The link for the Intel CPU heads to the ARK page for the Intel® Xeon® Processor E7-8893 v2, a 6 core chip with 12 threads. Where do you see 15 cores with 30 threads?

Re:Interesting (0)

Anonymous Coward | about 4 months ago | (#47297751)

GP Linked to the wrong part -

http://ark.intel.com/products/75258/Intel-Xeon-Processor-E7-8890-v2-37_5M-Cache-2_80-GHz

Re:Interesting (1)

Anonymous Coward | about 4 months ago | (#47297821)

Oh? No, these parts cause quite a lot of trouble, and you won't know it until you're into kernel programming or HPC programming, and fighting sub-microsecond latency and lock contention issues.

And processors with weak cache coherency and weak memory ordering are **MURDER** on the normal programmers. The less of those exists, the better. Most people cannot even GRASP the weak memory/cache ordering model, let alone deal with issues caused by it.

Re:Interesting (0)

Anonymous Coward | about 4 months ago | (#47298491)

Before Pentium dual cores, there were multiprocessor Pentium Pro boards, that cache coherency would kill you, because the cache you had to invalidate, was not on the chip, but on the system bus, (which at that time was a slow 66Mhz bus),

There are several topologies on which to arrange the chips so that they can form assembly lines or triple sets of hypercubes.

Re:Interesting (1)

Bengie (1121981) | about 4 months ago | (#47298589)

High "thread" count cores are good for work loads where there is little inter-thread communication and has lots of memory stalls. By having a lot of threads running at once, whenever there is a memory stall, you can just switch to another thread, and the chance of that thread being stalled is very low. This also means lots more cache thrashing, so you need larger caches, but they can be tuned for high-throughput high-latency. The entire design for these cpus is geared for high-throughput high-latency, which also tends to be great for energy efficiency.

Parallel processing still remains elusive. (2)

140Mandak262Jamuna (970587) | about 4 months ago | (#47297423)

Parallel processing has made big strides, but only in some limited areas. Graphics rendering where each pixel can be updated independent of other pixels. Or in fluid mechanics (CFD) using time marching techniques where updating the solution at one point needs data from a limited set of neighbors, or in iterative solvers of matrices. Even something very structured without if statements like inverting a matrix, parallel methods have suffered.

Basic problem is this, even if just 5% of the work has to be serial, the maximum speedup is 20x, that is the theoretical maximum. YMMV, and it does. Internet and search has opened up another vast area where a thread can do lots of work and send just very small set of results back to the caller. Hits are so small compared to misses, you can make some headway. Even then we have found very few applications suitable for massively parallel solutions.

We need a big breakthrough. If you divide a 3D domain into a number of sub domains, the interfaces between the subdomains is 2D. The volume of 3D domain represents computational load, and the area interfaces represent the communication load. If we could come up with domain-division algorithms that guarantee the interfaces would be an order of magnitude smaller, even as we go from 3D to higher number of dimensions, and if we could organize these subdomains into hierarchies, we would be able to deploy more and more of computational work, and be confident the communication load would not overwhelm the algorithm. This break through is yet to come. Delaunay Tessellations (and its dual Voronoi polygons) have been defined in higher dimensions. But the number of "cells" to number of "vertices" ratio explodes in higher dimensions, last time we tried, we could not even fit a 10 dimensional mesh of 10 points into all the available memory of the machine. It did not look promising.

Not entirely new (0)

Anonymous Coward | about 4 months ago | (#47297461)

CellBE was designed like this.
It had a shared loop where data was kicked around in both directions and each core picked out only that which was addressed to it.

And I don't even think this is an idea that was new to Cell either.

Great, so they reinvented (1)

LeadSongDog (1120683) | about 4 months ago | (#47297669)

..the Transputer. Great idea, but a giant market fail.

Re:Great, so they reinvented (1)

itzly (3699663) | about 4 months ago | (#47298617)

Giant market fail, because it was not a great idea after all.

Re:Great, so they reinvented (1)

angel'o'sphere (80593) | about 4 months ago | (#47299159)

Lol,
Such says the guy with no clue.
The Transputers where way ahead of what we do in our days.
And the first thing I thought when I saw the MIT concept is: "oh, they have put 32 transputers on a single die".

Transputers where build byba company called INMOS.

About 90% of the military hardware in Europe (around 1990/1995) was running on transputers.

That means radar systems, flight control, avionic hardware etc.

INMOS went down because the Japanes wanted to buy it. But the french government intervened and prevented it. After some years of debating a frensh (government owned) company (was it Thales?) bought INMOS.

But as the company had no interest in processor manufaction, they simply shut that branch down and got rid of it.

Down went a multi billion research program of the EU, I would not wonder if there was an US sponsored conspiracy behind it ... I can not cease to wonder why our processors in real live are still 30 years behind what we had in our universities as research prototypes.

DOCTOR Singapore Research Professor, for you. (0)

Thanshin (1188877) | about 4 months ago | (#47297715)

Li-Shiuan Peh, the Singapore Research Professor of Electrical Engineering and Computer Science at MIT

If he is the Singapore Research Professor of Electrical Engineering and Computer Science at MIT, who is the Research Professor of Electrical Engineering and Computer Science?

Re:DOCTOR Singapore Research Professor, for you. (2)

vovin (12759) | about 4 months ago | (#47298515)

Uh, she.

Such Short memory. . . (0)

Anonymous Coward | about 4 months ago | (#47298489)

Been there, done that:
http://en.wikipedia.org/wiki/Transputer
http://en.wikipedia.org/wiki/Network_on_a_chip
And still, the modern interconnects from the likes of ARM (CCN-508) are, in effect, the same thing.
And then there's this:
http://www.xmos.com/
IBM even does this with their MCM's for their high end servers & Mainframes.
Serializing things to send over to another core also costs time/transistors.

What's really needed is a novel approach in how to exploit all of this processing power and (oh by the way, as the man in the corner says) get a better SW architecture in place that can take advantage of all of this. Things today are just soooo inefficient.

Best of luck!!!

lots of other many-core processors (1)

loufoque (1400831) | about 4 months ago | (#47298967)

There are hundreds of processors with 64 cores or more, each of them claiming to have solved the scalability problem.

Why I like multi-core: (1)

packrat0x (798359) | about 4 months ago | (#47299231)

Because Windows programs have a habit of taking over a processor; acting like I am still using DOS.

Load More Comments
Slashdot Login

Need an Account?

Forgot your password?