Researchers Unveil Experimental 36-Core Chip

Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

Researchers Unveil Experimental 36-Core Chip 143

Posted by samzenpus on Monday June 23, 2014 @08:52AM from the we-need-another-core dept.

rtoz writes The more cores — or processing units — a computer chip has, the bigger the problem of communication between cores becomes. For years, Li-Shiuan Peh, the Singapore Research Professor of Electrical Engineering and Computer Science at MIT, has argued that the massively multicore chips of the future will need to resemble little Internets, where each core has an associated router, and data travels between cores in packets of fixed size. This week, at the International Symposium on Computer Architecture, Peh's group unveiled a 36-core chip that features just such a "network-on-chip." In addition to implementing many of the group's earlier ideas, it also solves one of the problems that has bedeviled previous attempts to design networks-on-chip: maintaining cache coherence, or ensuring that cores' locally stored copies of globally accessible data remain up to date.

This discussion has been archived. No new comments can be posted.

Researchers Unveil Experimental 36-Core Chip

Load All Comments

Search 143 Comments Log In/Create an Account

Comments Filter:

Comment removed (Score:4, Funny)

by account_deleted ( 4530225 ) writes: on Monday June 23, 2014 @08:55AM (#47297223)

Comment removed based on user account deletion

Share
twitter facebook
- Re: (Score:1, Funny)
  
  by Anonymous Coward writes:
  
  You'd need to imagine a beowulf cluster of 'em to accomplish that.
- Re: (Score:2)
  
  by Jeremy Erwin ( 2054 ) writes:
  
  To really run crysis well, you'd probably need something like the GeForce GTX Titan-- which has 896 double precision cores. However, if you raytrace the graphics, you might be able to run it on a 72 core Knights Landing chip.
36 cores? Network on a chip? Meh! (Score:1)

by Kyle ( 4392 ) writes:

http://www.adapteva.com/epipha... [adapteva.com]
64 cores, mesh network that extends off the chip, in production.
Try harder MIT :-p
- Re: (Score:1)
  
  by itzly ( 3699663 ) writes:
  
  Adding cores is easy. Keeping all the cores busy with useful work in a typical range of high performance applications is the difficult part.
- Re: (Score:1)
  
  by Melkhior ( 169823 ) writes:
  
  <quote><p> <a href="http://www.adapteva.com/epiphanyiv/">http://www.adapteva.com/epipha...</a>
  64 cores, mesh network that extends off the chip, in production.</p><p>Try harder MIT :-p</p></quote>
  
  They already tried harder : http://www.tilera.com/. And as another post mentioned, Intel Knights Corner is cache coherent on 61 cores (62 architectured).
  
  The summary doesn't get the point of the article: what's novel is not the presence of cache coherency, it's just the
- Re:36 cores? Network on a chip? Meh! (Score:5, Informative)
  
  by TheRaven64 ( 641858 ) writes: on Monday June 23, 2014 @01:00PM (#47298847) Journal
  
  The core count isn't the interesting thing about this chip. The cores themselves are pretty boring off-the-shelf parts too. I was at the ISCA presentation about this last week and it's actually pretty interesting. I'd recommend reading the paper (linked to from the press release) rather than the press release, because the press release is up to MIT's press department's usual standards (i.e. completely content-free and focussing on totally the wrong thing). The cool stuff is in the interconnect, which uses the bounded latency of the longest path multiplied by single-cycle one-hop delivery times to define an ordering, allowing you to implement a sequentially consistent view of memory relatively cheaply.
  Since I'm here, I'll also throw out a plug for the work we presented at ISCA, The CHERI capability model: Revisiting RISC in an age of risk [cam.ac.uk]. We've now open sourced (as a code dump, public VCS coming soon) our (64-bit) MIPS softcore, which is the basis for the experimentation in CHERI. It boots FreeBSD and there are a few sitting around the place that we can ssh into and run. This is pretty nice for experimentation, because it takes about 2 hours to produce and boot a new revision of the CPU.
  
  Parent Share
  twitter facebook
  - - Re: (Score:3)
      
      by TheRaven64 ( 641858 ) writes:
      
      Yes, we've also released the generated Verilog for anyone who wants to use just that. If you're a university, you can easily get a free license for Bluespec. If you're not, then you either most likely don't have the resources to get a decent FPGA (the ones that can run a processor at a useable speed start at about $3K), or you can probably afford the license. We're also talking to Bluespec about open sourcing their compiler, as most of their real value is from other services on top of it, but that's like
Intel Knights Landing (Score:3, Informative)

by SirDrinksAlot ( 226001 ) writes: on Monday June 23, 2014 @09:09AM (#47297289) Journal

So what's special about this chip that Intel's Xeon Phi (first demonstrated in 2007 as Knights Landing with 80 or so cores) isn't already doing? Or is this just a rehash of 7 year old technology that's already in production? It sounds like a copy/paste of Intel's research.
"Intel's research chip has 80 cores, or "tiles," Rattner said. Each tile has a computing element and a router, allowing it to crunch data individually and transport that data to neighboring tiles." - Feb 11, 2007

Share
twitter facebook
- Re: (Score:3)
  
  by dreamchaser ( 49529 ) writes:
  
  Presumably the novel way they address (pun intended) cache coherency is what is new. More efficiency = greater performance. Time will tell.
  - Re:Intel Knights Landing (Score:5, Informative)
    
    by Trepidity ( 597 ) writes: <[gro.hsikcah] [ta] [todhsals-muiriled]> on Monday June 23, 2014 @09:24AM (#47297359)
    
    Yes, as usual, the MIT press release oversells the research, while the original paper [pdf] [mit.edu] is a bit more careful in its claims. The paper makes clear that the novel contribution isn't the idea of putting "little internets" (as the press release calls them) on a chip, but acknowledges that there is already a lot of research in the area of on-chip routing between cores. The paper's contribution is to propose a new cache coherence scheme which they claim has scalability advantages over existing schemes.
    
    Parent Share
    twitter facebook
    - Re: (Score:2)
      
      by epine ( 68316 ) writes:
      
      The paper's contribution is to propose a new cache coherence scheme which they claim has scalability advantages over existing schemes.
      Somehow this was obvious to me even from the press release. I've never yet seen details of an ordering model laid bare where it wasn't the core novelty. Ordering models are inherently substantive. Ordering models beget theorems. Cute little Internets drool and coo.
- Re: (Score:2)
  
  by gman003 ( 1693318 ) writes:
  
  It does seem rather similar - a large cluster of cores, laid out in a grid topology. Perhaps they're doing something different with the cache coherency? I couldn't find too much on how Intel's handling that, and it seems to be a focus of the articles on this chip.
Architecture (Score:2)

by wjcofkc ( 964165 ) writes:

I would be curious to know more about the architecture and all around chip specs they are using in their prototype: clock speed, memory interface, etc. The article states they are developing a version of Linux to test it on, so it's safe to say it's an established architecture. Anyway, I am excited to see the results once they have tested it on Linux. While this does not help with the density per core problem, perhaps it will help extend Moore's Law from the perspective of speed increase in respect to micro
Is there anything new here? (Score:2)

by Junta ( 36770 ) writes:

So, in one die, it's a little interesting, though GPU stream processors and Intel's Phi would seem to suggest this is not that novel. The latter even let's you ssh in and see the core count for yourself in a very familiar way (though it's not exactly the easiest of devices to manage, it's still a very much real world example of how this isn't new to the world).
The 'not all cores are connected' is even older. In the commodity space, hypertransport and QPI can be used to construct topologies that are not fu
- Re:Is there anything new here? (Score:5, Informative)
  
  by Trepidity ( 597 ) writes: <[gro.hsikcah] [ta] [todhsals-muiriled]> on Monday June 23, 2014 @09:39AM (#47297429)
  
  The basic idea isn't new. What the paper is really claiming is new is their particular cache coherence scheme, which (to quote from the Conclusion) "supports global ordering of requests on a mesh network by decoupling the message delivery from the ordering", making it "able to address key coherence scalability concerns".
  How novel and useful that is I don't know, because it's really a more specialist contribution than the headline claims, to be evaluated by people who are experts in multicore cache coherence schemes.
  
  Parent Share
  twitter facebook
  - Re:Is there anything new here? (Score:4, Informative)
    
    by enriquevagu ( 1026480 ) writes: on Monday June 23, 2014 @02:14PM (#47299371)
    
    Some knowledge about multicore cache coherence here. You are completely right, Slashdot's summary does not introduce any novel idea. In fact, a cache-coherent mesh-based multicore system with one router associated to each core was presented on the market years ago by a startup from MIT, Tilera [tilera.com]. Also, the article claims that today's cores are connected by a single shared bus -- that's far outdated, since most processors today employ some form of switched communication (an arbitrated ring, a single crossbar, a mesh of routers, etc).
    What the actual ISCA paper [mit.edu] presents is a novel mechanism to guarantee total ordering on a distributed network. Essentially, when your network is distributed (i.e., not a single shared bus, basically most current on-chip network) there are several problems with guaranteeing ordering: i) it is really hard to provide a global ordering of messages (like a bus) without making all messages cross a single centralized point which becomes a bottleneck, and ii) if you employ adaptive routing, it is impossible to provide point-to-point ordering of messages.
    Coherence messages are divided in different classes in order to prevent deadlock. Depending on the coherence protocol implementation, messages of certain classes need to be delivered in order between the same pair of endpoints, and for this, some of the virtual networks can require static routing (e.g. Dimension-Ordered Routing in a mesh). Note a "virtual network" is a subset of the network resources which is used by the different classes of coherence messages to prevent deadlock. This is a remedy for the second problem. However, a network that provided global ordering would allow for potentially huge simplifications of the coherence mechanisms, since many races would disappear (the devil is in the details), and a snoopy mechanism would be possible -- as they implement. Additionally, this might also impact the consistency model. In fact, their model implements sequential consistency, which is the most restrictive -- yet simple to reason about -- consistency model.
    Disclaimer: I am not affiliated with their research group, and in fact, I have not read the paper in detail.
    
    Parent Share
    twitter facebook
Where's my massively parrallel programming languag (Score:2)

by KingOfBLASH ( 620432 ) writes:

While adding an extra core or two made big jumps in performance (because you are almost always running at least two applications) there comes a point where most users won't see a performance boost. While I may now be able to throw 36 processors at a problem, you have to program all those cores to work together. Right now that's a lot of effort, and until programming languages catch up and can optimize code by making it massively parallel, this is going to be a non-starter.
- Re: (Score:1)
  
  by itzly ( 3699663 ) writes:
  
  A "new programming language" isn't a magical solution to make a non-parallel algorithm work well on a multi processor architecture.
  - Re: (Score:2)
    
    by Z00L00K ( 682162 ) writes:
    
    The question is - do you always need a parallel tasking software? Most tasks are bread&butter tasks, no need to chew them up. Put your energy into the few things that do need to be broken up.
    But mostly it's a "hen and egg" problem - can't do multi-core software since there aren't enough serious multi-core machines, or the owners in software companies don't see a benefit in it.
  - - Re: (Score:3)
      
      by Rockoon ( 1252108 ) writes:
      
      You could go ahead a run both paths of code. Then decide which one is correct and discard the unused results.
      Intel is already doing partial speculative execution in the case of conditional branches. The pipeline is filled with the predicted path which is then frequently executed out of order (before the condition is known) ..
      
      Intel is not however doing the full concept you have described (eager speculative execution) and I don't think its likely that they ever will. The best case for eager speculative execution would be when the branches are completely unpredictable, which is only very rarely true. Further, it r
- Scala? (Score:2)
  
  by bigsexyjoe ( 581721 ) writes:
  
  Maybe Scala can be your language. It supports creating your code out of mostly immutable objects, which makes it good for parellelism.
The two hardest problems in CS: (Score:5, Funny)

by magsol ( 1406749 ) writes: on Monday June 23, 2014 @09:27AM (#47297373) Journal

pointer arithmetic, cache invalidation, and off-by-one errors

Share
twitter facebook
Interesting (Score:4, Informative)

by Virtucon ( 127420 ) writes: on Monday June 23, 2014 @09:31AM (#47297395)

Cache coherency has been one of the banes of multicore architecture for years. It's nice to see a different approach but chip manufacturers are already getting high performance results without introducing additional complexity. The Oracle (Sun) Sparc T5 [oracle.com] architecture has 16 cores with 128 threads running at 3.6Ghz. It gives a few more years to Solaris at least but it's still a hell of a processor. For you Intel fans the E7-2790 v2 [intel.com] sports 15 cores with 30 threads with a 37.5MB cache so they're doing something right because it screams and is capable of 85GB/s memory throughput.
I'm sure the chip architects are looking at this research but somehow I think they're already ahead of the curve because these kinds of cores/threads are jumps ahead of where we were just a few years ago. Anybody remember the first Pentium Dual Core [wikipedia.org] and The UltraSparc T1 [wikipedia.org]?

Share
twitter facebook
- Re: (Score:2)
  
  by Bengie ( 1121981 ) writes:
  
  High "thread" count cores are good for work loads where there is little inter-thread communication and has lots of memory stalls. By having a lot of threads running at once, whenever there is a memory stall, you can just switch to another thread, and the chance of that thread being stalled is very low. This also means lots more cache thrashing, so you need larger caches, but they can be tuned for high-throughput high-latency. The entire design for these cpus is geared for high-throughput high-latency, which
  - Re: (Score:2)
    
    by Virtucon ( 127420 ) writes:
    
    Oh no question, high thread counts would make sense for say a web service application server vs. something more compute intensive. None of these architectures will ever be in the terraflop or petaflop range for that so there will still be need for specialization of highly compute intensive workloads to those kinds of systems. One thing that will kill this architecture is software compatibility, so it'd be interesting to see if it does take off. In the meantime Moore's law will keep pushing the Sparc and
Parallel processing still remains elusive. (Score:3)

by 140Mandak262Jamuna ( 970587 ) writes: on Monday June 23, 2014 @09:38AM (#47297423) Journal

Parallel processing has made big strides, but only in some limited areas. Graphics rendering where each pixel can be updated independent of other pixels. Or in fluid mechanics (CFD) using time marching techniques where updating the solution at one point needs data from a limited set of neighbors, or in iterative solvers of matrices. Even something very structured without if statements like inverting a matrix, parallel methods have suffered.
Basic problem is this, even if just 5% of the work has to be serial, the maximum speedup is 20x, that is the theoretical maximum. YMMV, and it does. Internet and search has opened up another vast area where a thread can do lots of work and send just very small set of results back to the caller. Hits are so small compared to misses, you can make some headway. Even then we have found very few applications suitable for massively parallel solutions.
We need a big breakthrough. If you divide a 3D domain into a number of sub domains, the interfaces between the subdomains is 2D. The volume of 3D domain represents computational load, and the area interfaces represent the communication load. If we could come up with domain-division algorithms that guarantee the interfaces would be an order of magnitude smaller, even as we go from 3D to higher number of dimensions, and if we could organize these subdomains into hierarchies, we would be able to deploy more and more of computational work, and be confident the communication load would not overwhelm the algorithm. This break through is yet to come. Delaunay Tessellations (and its dual Voronoi polygons) have been defined in higher dimensions. But the number of "cells" to number of "vertices" ratio explodes in higher dimensions, last time we tried, we could not even fit a 10 dimensional mesh of 10 points into all the available memory of the machine. It did not look promising.

Share
twitter facebook
Great, so they reinvented (Score:2)

by LeadSongDog ( 1120683 ) writes:

..the Transputer. Great idea, but a giant market fail.
- - Re: (Score:2)
    
    by angel'o'sphere ( 80593 ) writes:
    
    Lol,
    Such says the guy with no clue.
    The Transputers where way ahead of what we do in our days.
    And the first thing I thought when I saw the MIT concept is: "oh, they have put 32 transputers on a single die".
    Transputers where build byba company called INMOS.
    About 90% of the military hardware in Europe (around 1990/1995) was running on transputers.
    That means radar systems, flight control, avionic hardware etc.
    INMOS went down because the Japanes wanted to buy it. But the french government intervened and prevente
    - - Re: (Score:2)
        
        by angel'o'sphere ( 80593 ) writes:
        
        People are still doing it, or did you not get what this article is about?
        Or did you never catch any hint (given in this thread and many others) about http://www.greenarraychips.com... [greenarraychips.com] ?
  - Re: (Score:2)
    
    by slew ( 2918 ) writes:
    
    Giant market fail, because it was not a great idea after all.
    Actually, the transputer had a few good kernels of an idea: sea of loosely interconnected processors each with local memory. However, the actually execution wasn't that good, and the only real market was embedded military signal processing systems. For a while, inmos attempted to chase workstation graphics, but eventually they got killed by the i860 (which is sad as it too wasn't a very good implementation of any idea either, but happened better $/! than the transputer for floating point) which of course
lots of other many-core processors (Score:2)

by loufoque ( 1400831 ) writes:

There are hundreds of processors with 64 cores or more, each of them claiming to have solved the scalability problem.
Nice design. (Score:2)

by Animats ( 122034 ) writes:

This is a nice little trick. This has the potential to extend shared consistent memory multiprocessor designs to far larger numbers of processors. Whether this is a performance win remains to be seen. Good idea, though. Note that the prototype chip is just a feasibility test; they used an off the shelf Power CPU design, added their interconnect network, and send the job off to a fab. A production chip would have optimizations this does not.
We known only two general purpose multiprocessor architectures t
Not a big deal (Score:2)

by AaronW ( 33736 ) writes:

I don't see what the big deal is. I'm currently working with early silicon on a cache coherent 48-core 64-bit MIPS chip with NUMA support and built-in 40Gbps Ethernet support. The chip also has a lot of extended instructions for encryption and hashing plus a lot of hardware engines for things like zip compression, RAID calculations, regular expression engines and networking support among other things. It also has built-in support for content addressable memory.
It also has a network on-chip where each core
Spin Ph.Ds? (Score:2)

by Meeni ( 1815694 ) writes:

MIT is expert a making these sort of PR stunts were they claim they invented something novel when they replicate some boring old result from 10yr ago. Well, here it is 30yr ago.
- Re: (Score:1, Insightful)
  
  by drinkypoo ( 153816 ) writes:
  
  According to the comparison table, (Refer timeline 4:21 of this video) this chip uses 1.1V while other standard chips are using 1.0V. This difference may make it hard for the chip makers to use this technology.
  Really? They won't be able to specify a 1.1V VRM instead of a 1.0V VRM? Those poor, poor chip makers. They sound like a bunch of incompetent fucks.
  - Re:Different Power Supply Voltage (Score:5, Interesting)
    
    by fuzzyfuzzyfungus ( 1223518 ) writes: on Monday June 23, 2014 @09:22AM (#47297351) Journal
    
    A higher high/low voltage swing (with a reasonable amount of other stuff being equal) will be more of a thermal nuisance; but if the perks make up for it, that's hardly a dealbreaker. The toasty end of boring desktop CPUs is somewhere north of 200watts already, with a little shoving that they typically survive, so if somebody really wants 36 cache-coherent cores on-die, they'll suck it up and make it work.
    
    For applications that don't specifically demand that, I'd be interested to know how the costs and benefits of 'dealing with the cooling demands of a smaller number of denser parts' compare with 'dealing with the cooling demands of more, cooler, parts, closer to whatever the performance per watt sweet spot is; but with more cabling, PSUs, switches, and similar interconnect and support stuff to buy and power'...
    
    Parent Share
    twitter facebook
    - Re: (Score:2)
      
      by wagnerrp ( 1305589 ) writes:
      
      The toasty end of boring desktop CPUs is somewhere north of 200watts already
      Well... somewhere south of 100W, anyway, and even high end workstation/server chips are under 150W.
      - Re: (Score:2)
        
        by wagnerrp ( 1305589 ) writes:
        
        Intel abandoned the Netburst architecture in the mid-2000s. All that hardware is a decade old, except for the Pentium Pro, which is almost two decades old.
        
        Re: (Score:2)
        
        by fuzzyfuzzyfungus ( 1223518 ) writes:
        
        Post Netburst, AMD is the one having TDP issues, and their current enthusiast-gamer-nutjob CPU is specced at 220 watts. Intel has their numbers down from the Prescott Pentium D days, though the use of 'TDP' rather than peak, and thermal throttling that actually works, makes it a little tricky to pin a precise ceiling value on some of them without actually getting out the test equipment.
        
        Most are, of course, much lower, given the popularity of laptops and desktops that don't need water cooling, and so on.
        
        Re: (Score:2)
        
        by wagnerrp ( 1305589 ) writes:
        
        and their current enthusiast-gamer-nutjob CPU is specced at 220 watts.
        I'll admit, the AMD FX was the only line I didn't check before posting. Their next closest chips are only 140W, and they've only got a couple at that. Most are 115W or lower. I didn't even know the AM3 socket was capable of 220W.
        
        Re: (Score:2)
        
        by fuzzyfuzzyfungus ( 1223518 ) writes:
        
        Based on the mixed reviews, it sounds like 220w is really pushing your luck unless the motherboard has some heroically overqualified VRM onboard, and your PSU is descended from an arc welder on its mothers side; but I've yet to see a single report of somebody actually fusing a pin rather than just crashing a lot, so apparently the socket is tougher than it looks. I was very surprised to see such a part being sold at that power level, though, rather than just 'unlocked, and we'll just look the other way'.
  - - Re: (Score:2)
      
      by drinkypoo ( 153816 ) writes:
      
      I figure you must be some type of alcoholic or have another substance problem. Can you confirm?
      Yeah, I'm allergic to stupidity. I have to take a pill before I can come anywhere near slashdot, and keep an inhaler and epi pen on hand.
- Re: (Score:3)
  
  by edmudama ( 155475 ) writes:
  
  That doesn't matter. The power supply surrounding the socket/pads will account for whatever Vcc needs to be.
  - - Re:Different Power Supply Voltage (Score:5, Interesting)
      
      by Moof123 ( 1292134 ) writes: on Monday June 23, 2014 @12:37PM (#47298639)
      
      Banging my head on the table right now.
      Why do people with zero actual semiconductor knowledge try to speak as an authority*?!
      It's a research chip, meaning they don't need to be on the latest process node to show their proof of concept. Larger nodes (much cheaper to design a chip on) have thicker gate passivation layers and run at higher voltage. From an architecture standpoint the process node/voltage are irrelevant. So if their architecture proves out, some bigger outfit can run with it while targetting the latest-greatest itty-bitty process node to increase the clock-rate, drop the power, and reduce the area/price.
      *I am not a processor designer, just a mixed signal (mostly analog) guy, but I've been working in the semiconductor industry, including doing process bake-offs for over a dozen years.
      
      Parent Share
      twitter facebook
      - Re: (Score:2)
        
        by Ralph Wiggam ( 22354 ) writes:
        
        Why do people with zero actual semiconductor knowledge try to speak as an authority*?
        Is this your first day on Slashdot?
    - Re: (Score:2)
      
      by wagnerrp ( 1305589 ) writes:
      
      You people seem to forget we're dealing with chips that have features counted in individual atoms. 1V across three atoms may work, 1.1V across three atoms arcs over.
      Luckily we're still dealing with features hundreds of atoms across, and not just three...
    - Re: (Score:2)
      
      by viperidaenz ( 2515578 ) writes:
      
      Current 22nm Intel CPU cores run perfectly fine with a core voltage of 1.26V.
    - Re: (Score:2)
      
      by cheater512 ( 783349 ) writes:
      
      Add an extra atom. Pretty simple. No reason why it has to be 3 atoms thick.
      In fact I hear that a few years ago the smallest features were hundreds of atoms! Who knows how they managed to deal with this tricky issue of higher voltages.
- Re: (Score:2)
  
  by LynnwoodRooster ( 966895 ) writes:
  
  According to the comparison table, (Refer timeline 4:21 of this video [youtube.com]) this chip uses 1.1V while other standard chips are using 1.0V. This difference may make it hard for the chip makers to use this technology.
  No, it's the only way to make it faster because it goes to eleven...
- Re: (Score:2)
  
  by viperidaenz ( 2515578 ) writes:
  
  Current CPU's can run perfectly fine on 1.1V.
  A common core voltage is currently 0.6V - 1.35V, depending on clock.
  Voltage is a function of process technology, not system architecture.
- Re:Moore's Law (Score:5, Interesting)
  
  by Opportunist ( 166417 ) writes: on Monday June 23, 2014 @09:09AM (#47297285)
  
  As an aside: It's been a while since we've seen any decent rise in processor Ghz.
  Just to abuse a car analogy: Maybe it's time we stop revving up and instead shift gears.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by account_deleted ( 4530225 ) writes:
    
    Comment removed based on user account deletion
    - Re: (Score:2)
      
      by dreamchaser ( 49529 ) writes:
      
      There are still technical challenges to increasing clock speed. Just because "IBM said it would" doesn't make it so. Instead you are seeing higher IPC due to architectural refinements as well as more and more cores. Clock speeds are still inching up but do not expect any huge radical jumps anytime soon.
    - Re: (Score:2)
      
      by Opportunist ( 166417 ) writes:
      
      Of course, but just like when you shift, first shift, then rev the engine up again. Else your clutch will probably wear out quickly.
      And no, I have absolutely no idea how that analogy still applies.
  - Re: (Score:2)
    
    by Virtucon ( 127420 ) writes:
    
    Nope, Liquid Nitrogen cooling gets you past the speed limits. How about over 8Ghz [youtube.com] on a chip that costs less than $200? Going to Helium and you can get over 8.5Ghz. [youtube.com] although both become a bit unweildy when it comes to game play because I don't want my hard drives to freeze. I love that last video there's some real country boy engineering there including using a propane torch and a hair dryer to keep certain components from freezing.
    - Re:Moore's Law (Score:4, Insightful)
      
      by Shoten ( 260439 ) writes: on Monday June 23, 2014 @10:37AM (#47297717)
      
      Nope, Liquid Nitrogen cooling gets you past the speed limits. How about over 8Ghz [youtube.com] on a chip that costs less than $200? Going to Helium and you can get over 8.5Ghz. [youtube.com] although both become a bit unweildy when it comes to game play because I don't want my hard drives to freeze. I love that last video there's some real country boy engineering there including using a propane torch and a hair dryer to keep certain components from freezing.
      I'm a little confused as to why you're citing the chip's low low price of "less than $200" if you need liquid nitrogen to get it to perform the way you want it to. You do realize that cooling systems cost money, too...right? There's no point in being able to use a cheap processor to get to X performance benchmark if the required additional support systems cost thousands of dollars more than a more powerful and more expensive processor that can do it out of the box. Not to mention the fact that liquid nitrogen cooling isn't exactly hassle-free, especially in a household environment. And it's worth noting that even if you boost Ghz, you eventually run into choke points related to pushing data to and from the chip anyways. You can give the most important worker on an assembly line all the crystal meth they can eat, but they can't work any faster than the conveyor belt in front of them.
      
      Parent Share
      twitter facebook
      - Re:Moore's Law (Score:5, Funny)
        
        by ColdWetDog ( 752185 ) writes: on Monday June 23, 2014 @01:14PM (#47298945) Homepage
        
        You can give the most important worker on an assembly line all the crystal meth they can eat, but they can't work any faster than the conveyor belt in front of them.
        Ah! The 21st Century version of the 'mythical man month' - so much more apropos for this audience than the pregnancy analogy.
        
        Parent Share
        twitter facebook
      - Re: (Score:2)
        
        by Virtucon ( 127420 ) writes:
        
        Some of us run better than off the shelf liquid cooling, no hassles and for less than 300 bucks. I have a nice system and it's quiet because I can run the big fans. Sure, Liquid Nitrogen systems are available but the OP was about stopping the rev up process, since 8Ghz is now possible, the barrier needs to be set higher. I don't think we'll see it anytime within the next five years but maybe.
        
        Re: (Score:2)
        
        by Shoten ( 260439 ) writes:
        
        Some of us run better than off the shelf liquid cooling, no hassles and for less than 300 bucks. I have a nice system and it's quiet because I can run the big fans. Sure, Liquid Nitrogen systems are available but the OP was about stopping the rev up process, since 8Ghz is now possible, the barrier needs to be set higher. I don't think we'll see it anytime within the next five years but maybe.
        Yeah, but Intel and AMD will go bankrupt if they make chips just for "some of us." And if you look at where Intel has gotten their speed increases, very little of it in the past decade has been from clock speed. Ghz is no longer where the performance boost is to be found.
        
        Re: (Score:2)
        
        by Virtucon ( 127420 ) writes:
        
        Ghz is king because not all workloads are multithreaded enough to take advantage of multiple cores/threads. Eventually software engineers will catch up and start
        leveraging what the architecture provides I'd bet that 8 out of 10 COTS packages out there at least in the Desktop arena don't take advantage multithreading.
    - Re: (Score:2)
      
      by skovnymfe ( 1671822 ) writes:
      
      Liquid Nitrogen/Helium cooling is great... while it lasts. When it's used up however, you've got to pay for another bottle of cooling. I have no idea how long a $200 CPU can run at 8GHz on a bottle of Nitrogen, or how much a bottle of Nitrogen costs, but I can't imagine it's a good long term solution.
      - Re: (Score:3)
        
        by SuricouRaven ( 1897204 ) writes:
        
        Nitrogen overclocking is done for contests. You can get phase change cooling, which is the next best thing and will still get your processor far below zero. The big downside to that is just power consumption. It's also bulky and noisy.
      - Re: (Score:2)
        
        by nmr_andrew ( 1997772 ) writes:
        
        Using liquid helium would be way cost prohibitive, especially for a very small gain (8.5 GHz vs. 8 GHz in GP's post). Under a fairly good contract and purchased in relatively large quantities, our current cost of LHe is slightly less than $11/liter.
        Liquid nitrogen is a different story. It's still expensive compared to a cooling fan, but we pay ~$0.40/liter for LN2. If you were doing a lot of this, the "standard" tank sizes are 160 or 180 L, one such tank carefully managed should last a bare minimum of 1-2 w
        
        Re: (Score:2)
        
        by Virtucon ( 127420 ) writes:
        
        Oh yeah plus Liquid Helium is becoming rare. http://phys.org/news201853523.... [phys.org]
        
        Re: (Score:2)
        
        by nmr_andrew ( 1997772 ) writes:
        
        I didn't really feel the need to go there, but I hear you. Considering that my job relies on keeping the superconducting coils in our magnets at ~4K, I'm all too aware of this. Our prices have nearly doubled in the last few years, and there have been a handful of supply scares.
        
        Re: (Score:2)
        
        by Virtucon ( 127420 ) writes:
        
        Well then since our Helium reserves come from Oil and Natural Gas drilling, all I can say is Drill baby Drill!
        When I started TIG welding in the 70s, Helium tanks were about $30/bottle which was still expensive considering a mortgage for a decent home was $300. Now all I use is Argon which is a bitch when trying to weld overhead.
  - Re: (Score:3, Interesting)
    
    by Anonymous Coward writes:
    
    A better analogy is that they keep adding seats and making the whole vehicle slower.
    Kawasaki Ninja == 10GHZ single core (fastest way to get anywhere alone)
    Ford Mustang == 4GHz quad-core (most people only use the front two seats, but if desperate you can squeeze more people in)
    Chevy Suburban == 3.3 GHz 8-core (it seems like everyone wants one, but most people who have a full load just have a bunch of little kiddies)
    Mercedes Sprinter == 2.7 GHz 12-core (just meant to be a grinding people hauler)
    School Bus ==
    - Re: (Score:2)
      
      by Z00L00K ( 682162 ) writes:
      
      Boeing 747 == take on a crapload of people for a long haul excursion.
    - Re: (Score:2)
      
      by jones_supa ( 887896 ) writes:
      
      Let's still not forget that a single core of a modern Core i7 chip is about 6x as fast as a single-core Pentium 4. At the same clockspeed.
      - Re: (Score:2)
        
        by default luser ( 529332 ) writes:
        
        Absolutely not true.
        The Core 2 Duo is approximately 2x faster clock-for-clock versus the Pentium 4 [techreport.com], and the current Haswell core is barely 40% faster than that (assume a 7% speedup per-clock for every core rev since). That gets you somewhere in the 2x-3x performance improvement range for Haswell, barring corner-cases that are embarrassingly easy to leverage AVX/FMA (most real-world use cases show small improvements).
        Intel proved that they could do a whole lot better than the Pentium 4, but your performance
        
        Re: (Score:2)
        
        by jones_supa ( 887896 ) writes:
        
        I used www.cpubenchmark.net for my numbers.
        
        Re: (Score:2)
        
        by default luser ( 529332 ) writes:
        
        Which uses Passmark, which is a simple corner-case number-crunching bonanza. Pure AVX2, or FMA without any real-world qualifiers, restriction or branching? Sure, we got that!
        And even with that, you're still off. The performance improvement with Haswell per-core is less than 5x. See here:
        http://www.cpubenchmark.net/compare.php?cmp[]=2020&cmp[]=1127 [cpubenchmark.net]
        So, in the unoptimized case the performance improvement is 2-3x, and in the embarrassingly-parallel case the speedup is 4-5x. But then, if you had such
        
        Re: (Score:2)
        
        by jones_supa ( 887896 ) writes:
        
        I was still less off than you.
        
        Re: (Score:2)
        
        by default luser ( 529332 ) writes:
        
        In your glass tower, yes.
        In the real world, not so much.
        Here is an example of one of the world's most optimized pieces of software: x264. It's also one of the few real-world loads that can take advantage of multiple processor and SSE. So how much speedup did this incredible piece of software see with AVX2, which DOUBLED the width of the integer pipelines?
        FIVE PERCENT! Yup, that's it! [videolan.org]
        All that work for so very little improvement, because in the REAL WORLD data does not align on perfect AVX2 boundaries, and
    - Re: (Score:2)
      
      by Kaenneth ( 82978 ) writes:
      
      The Titanic == Itanium
  - Re: (Score:2)
    
    by Salgat ( 1098063 ) writes:
    
    What an empty statement. It's easy to say we should try something else when things get difficult, without having any practical solution in place.
  - Re: (Score:2)
    
    by ColdWetDog ( 752185 ) writes:
    
    What's a 'gear'?
- Re: (Score:2)
  
  by K. S. Kyosuke ( 729550 ) writes:
  
  36-core is immense!
  Yawn... [greenarraychips.com]
  - Re: (Score:1)
    
    by account_deleted ( 4530225 ) writes:
    
    Comment removed based on user account deletion
    - Re: (Score:2)
      
      by Opportunist ( 166417 ) writes:
      
      Odd. I went to a catholic church, but strangely we never got to that part.
      Talk about selective teaching and leaving out the interesting parts!
  - Re: (Score:1)
    
    by itzly ( 3699663 ) writes:
    
    Try doing a 100x100 double precision matrix inversion on one of those chips, and you'll stop yawning pretty quickly.
    - Re: (Score:2)
      
      by K. S. Kyosuke ( 729550 ) writes:
      
      Now why would I want to do that? Obviously, for FP tasks, a modified design would be necessary - but given that GA144 is already unbeatable in integer performace energy efficiency, even at 180 nm where it's being manufactured, if you extend the ISA to be more FP-friendly and switch to a recent process, I don't see a problem. Well, it would need different memory interfaces to make it a shared memory multiprocessor. That's a bummer. But I guess it can't be helped; programmers are lazy.
      - Re: (Score:1)
        
        by itzly ( 3699663 ) writes:
        
        Maybe you don't want to do that, but good floating point performance is a requirement for a lot of useful tasks. Also, many real world tasks need access to large amounts of memory, and often that memory needs to be available to multiple nodes. The GA144 fails there too, since it has a pitiful amount of memory. Except for a small handful of niche applications that happen to match the GA144's capabilities, it's a useless device.
        
        Re: (Score:2)
        
        by K. S. Kyosuke ( 729550 ) writes:
        
        It's the notion (asynchronous, self-clocked, energy efficient chip, maximizing performance per watt and performance per mm^2) that matters to me, not this specific design (which is intended for specific purposes). Witness how the HPC people embraced GPUs, which are sort of heading in a similar direction already.
    - Re: (Score:2)
      
      by gnasher719 ( 869701 ) writes:
      
      Try doing a 100x100 double precision matrix inversion on one of those chips, and you'll stop yawning pretty quickly.
      That should be easily done in a millisecond or so on a single core of any modern Intel processor. You could probably get it down to 100 microseconds on the latest ones.
      - Re: (Score:2)
        
        by itzly ( 3699663 ) writes:
        
        My point exactly. What is a simple task on an modern Intel becomes nearly impossible on the GA144. We've already tried the idea of combining large numbers of simple processors, and it has failed every single time. If NxM simple cores together can't beat a modern Intel processor for a range of useful tasks, there's not much point in developing it.
  - - Re: (Score:2)
      
      by laird ( 2705 ) writes:
      
      Not just 'tried to', but actually delivered. Thinking Machines CM-1 and CM-2 had routers on chip with 32 CPUs per chip. In the hybercube architecture, the 5 lowest bits of CPU address routed on-chip, and the rest of the CPU address routed between chips. It worked quite well, and was the fastest computer on the planet for several years running.
- Re: (Score:1)
  
  by Anonymous Coward writes:
  
  Think of the possibilites:
  make -j36
- Re: (Score:2)
  
  by wjcofkc ( 964165 ) writes:
  
  The reason Apple stuck with the Power architecture for so long, was because IBM promised them quad and greater core chips running at 8 Ghz, air cooled, by 2005. Needless to say, they didn't even come close to delivering. It was that failure that led Apple to switch to x86.
  - Re: (Score:1)
    
    by account_deleted ( 4530225 ) writes:
    
    Comment removed based on user account deletion
- Re: (Score:2)
  
  by caseih ( 160668 ) writes:
  
  And hopefully in any lectures on Moore's Law, the students learn that Moore's Law refers to transistors on a die, not the speed of the chips. This 36-core chip probably jumps ahead of Moore's Law a bit, as it's got to be a fairly large die. In any event Moore's Law continues to hold, more or less. Other things like CPU speed have followed a similar trend in times past, but no longer do now.
  - Re: (Score:3)
    
    by willy_me ( 212994 ) writes:
    
    And hopefully in any lectures on Moore's Law, the students learn that Moore's Law refers to transistors on a die, not the speed of the chips. This 36-core chip probably jumps ahead of Moore's Law a bit, as it's got to be a fairly large die.
    Moore's Law refers to the number of components per integrated circuit for minimum cost. Note that this is basically transistor density and is not impacted by core size. Silicon defects and transistor size determine the optimal number of components per IC.
    A quote from Wikipedia,
    Moore himself wrote only about the density of components, "a component being a transistor, resistor, diode or capacitor,"[26] at minimum cost.
- Re: (Score:3)
  
  by Virtucon ( 127420 ) writes:
  
  Immense? Immense you say? Try IBM's mega footprint z196 [wikipedia.org] at over 512mm^2 is one big ass chip.
- Re: (Score:1)
  
  by account_deleted ( 4530225 ) writes:
  
  Comment removed based on user account deletion
- Re: (Score:2)
  
  by Z00L00K ( 682162 ) writes:
  
  It's of course good if the distance between cores is kept to a minimum, but if the software designers and compilers considers the limitations when generating the binaries it may not be a huge performance bottleneck in real world applications.
  It's better to switch to a new core than to switch task on a core for example. Looking at what happens in a modern PC most processing is mostly unrelated to the other. Even inside a web browser you may have several plugins running in different parts of the screen, but t
  - Re: (Score:2)
    
    by Rockoon ( 1252108 ) writes:
    
    When doing SIMD calculations then you run the same instruction in parallel on many cores with different data as input
    Your definition of core seems to be completely different from anyone elses. You seem to have relabeled execution units as 'cores,' and for seemingly completely ignorant reasons.
- Re: (Score:2)
  
  by UnknownSoldier ( 67820 ) writes:
  
  > it's been a while since we've seen any decent rise in processor Ghz.
  That's because silicon doesn't scale past 5 GHz at room temperature.
  > I remember IBM talking about functioning reasonably cool 10 Ghz processors (ref needed) in the early 2000s, but no one has them in the shops yet!
  There have been 100 GHz CPUs for ages but the supply/demand isn't financially viable yet.
  * http://www.itproportal.com/201... [itproportal.com]
- Re: (Score:2)
  
  by kelemvor4 ( 1980226 ) writes:
  
  That's a fun post! 36-core is immense! As an aside: It's been a while since we've seen any decent rise in processor Ghz. I remember IBM talking about functioning reasonably cool 10 Ghz processors (ref needed) in the early 2000s, but no one has them in the shops yet! I'm sure this was discussed in Moore's Law lectures prior to Y2K, but mention it these days and everyone scowls! So some people can (and they run cool) and some people can't, what normally happens in computing when the faster items are released?
  It's a step down from the 48 core CPU Intel created in 2009. http://www.intel.com/pressroom... [intel.com]
- Re: (Score:3)
  
  by vovin ( 12759 ) writes:
  
  Uh, she.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Comment removed (Score:4, Funny)

Re: (Score:1, Funny)

Re: (Score:2)

36 cores? Network on a chip? Meh! (Score:1)

Re: (Score:1)

Re: (Score:1)

Re:36 cores? Network on a chip? Meh! (Score:5, Informative)

Re: (Score:3)

Intel Knights Landing (Score:3, Informative)

Re: (Score:3)

Re:Intel Knights Landing (Score:5, Informative)

Re: (Score:2)

Re: (Score:2)

Architecture (Score:2)

Is there anything new here? (Score:2)

Re:Is there anything new here? (Score:5, Informative)

Re:Is there anything new here? (Score:4, Informative)

Where's my massively parrallel programming languag (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:3)

Scala? (Score:2)

The two hardest problems in CS: (Score:5, Funny)

Interesting (Score:4, Informative)

Re: (Score:2)

Re: (Score:2)

Parallel processing still remains elusive. (Score:3)

Great, so they reinvented (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

lots of other many-core processors (Score:2)

Nice design. (Score:2)

Not a big deal (Score:2)

Spin Ph.Ds? (Score:2)

Re: (Score:1, Insightful)

Re:Different Power Supply Voltage (Score:5, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re:Different Power Supply Voltage (Score:5, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:Moore's Law (Score:5, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:Moore's Law (Score:4, Insightful)

Re:Moore's Law (Score:5, Funny)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)