Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
AMD Upgrades Hardware

AMD Going Dual-Core In 2005 309

gr8_phk writes "We recently learned of Intel's plans to go dual-core in late 2005. Well it seems AMD has decided to follow suit. It should be noted that the K8 architecture has had this designed in from the start. Will this be socket 939 or should I try to hold out another year to buy?"
This discussion has been archived. No new comments can be posted.

AMD Going Dual-Core In 2005

Comments Filter:
  • by leonbrooks ( 8043 ) <SentByMSBlast-No ... .brooks.fdns.net> on Monday June 14, 2004 @07:21PM (#9424832) Homepage
    If more is better, why not proliferate cores like crazy?
    • by Haydn Fenton ( 752330 ) <no.spam.for.haydn@gmail.com> on Monday June 14, 2004 @07:23PM (#9424851)
      2's company, 3's a crowd, and 4 is for the fat cats who wipe their ass with 50 dollar bills.
    • by wmeyer ( 17620 ) on Monday June 14, 2004 @07:25PM (#9424870)
      Interestingly, in a review of P4 vs. K8, the K8 had a clear advantage at the 4 processor level and above, apparently because of reduced bus conflicts with their individual memory spaces. If AMD were to proliferate cores on chip, they'd wind up contesting for the memory bandwidth, just like the P4.
      • by ruiner5000 ( 241452 ) on Monday June 14, 2004 @07:57PM (#9425142) Homepage
        actually there is plenty of bandwidth left in hypertransport to pull it off. also each cpu gets its own bank of memory. the design is superior to all others for SMP. even AMD's man CPU man says that at infoworld [infoworld.com]

        AMD's dual-core server processors will share a single memory controller, Weber said. This won't create a bottleneck because a server with two Opteron chips, and therefore two memory controllers, already has more than enough memory bandwidth required to run that system, he said.

        "It's always a juggling act to add a little more processing and a little more memory. Right now, we have plenty of memory and I/O bandwidth, so we're adding processing," Weber said.

        The dual-core chips will work with current socket technology in motherboards that are rated for the specifications of the dual-core chips, Weber said. A BIOS change will be required, but otherwise the chips will work in the same sockets as single-core Opterons, he said.
        • Right now, we have plenty of memory and I/O bandwidth, so we're adding processing,

          Wow, that's a helluva significant statement to make. More bandwidth and cache never hurts... you just get diminishing returns. Either it's marketspeak-- "You need two cores, pay up!" --or it flies in the face of some pretty basic assumptions I have about processor and cache architectures. Perhaps he meant, "Our cache and bandwidth is currently large enough to support two processes without being detrimental, given two pro

          • Are you not aware of how well AMD's K8 core line of CPUs scales. There are hundreds of articles on the web mentioning it.
            • Regardless of whether they're accurate or not, the statement stands as being dumbfounding. How many times is "memory is the bottleneck" pounded into the heads of Computer Architecture students? AMD just said memory is not the bottleneck. You don't need a faster bus, or a faster cache, you need more parallelism. Usually that sweeping of a statement is qualified with a few caveats.
              • by charnov ( 183495 ) on Monday June 14, 2004 @08:48PM (#9425595) Homepage Journal
                Because the K8 has the memory controller on die, as you add processors, you actually add memory bandwidth. It kinda stands the old logic on its head. Really the only thing that can be an issue on this core is latency can make a difference at 16 CPUs or more ;-)
                • Hypertransport only supports up 16 nodes, and one of them has to be the southbridge. So you can't get to 16 processors anyways. :) Seriously though, I've only seen topologies of up to 8 processors at once. So quad-boards with two processors per core is probably about as high as they will go with this architecture.
              • Of course it requires a few caveats, but it's mostly correct. The Opteron has a rather impressive level of bandwidth/processor and extremely low latency to that memory. The integrated memory controller REALLY helps here and I suspect that for over 95% of the cases AMD is correct that they could drop a second core into the picture and still get very good scaling without having to worry about memory bandwidth.

                Ohh, and as for cache, that fortunately isn't a problem at all as each core comes with it's own 1M
          • by PaulBu ( 473180 ) on Monday June 14, 2004 @08:44PM (#9425576) Homepage
            But then, the trick is that he did not mention memory latency, only bandwidth! Getting the latter is relatively easy -- just make memory bus wider (as given bus speed), trying to decrease latency will pretty soon make you run into speed-of-light limitation.

            Maybe those processors do have enough memory bandwidth to load two of them completely doing SAXPY? Assuming 12 GFLOPS sustained (3 GHz, 2 cores, separate ADD and MUL on each) you need to feed input vectors at 12*8 bytes/double = 96 GB/sec, for, say 1 GHz memory bus it is translated into 96*8=768 memory pins only for input -- well, wider than I've seen on desktop PCs... ;-)

            When you start doing anything else , the roundtrip time between processors and memory (latency) becomes more important than raw bandwidth.

            Paul B.
      • by hxnwix ( 652290 ) on Monday June 14, 2004 @08:35PM (#9425503) Journal
        The opteron (k8) has an integrated memory controller and up to three hypertransport links. In a dual k8 system, the cpus communicate over a single hypertransport link and are usually paired with their own memory bank. If one cpu needs data from the other's bank, it comes over the hypertransport link. Some cheap dual opteron boards save traces by pairing one cpu with all the memory banks - so every memory operation on the non directly linked cpu passes over the h-link.

        The dual core cpu might have the pins for two seperate memory bank arrays or just the pins for one. Either way, the situation as far as dual k8s go is not really different from what we have already. Either way, it's a few steps above the p4 design: shared cpu bus to northbridge to memory. (yech! with a single proc, this introduces latency, with multiproc, you get contention and latency at every level)

        AMD's cpu interconnect is so well thought out... it gives me the warm fuzzies pondering it:

        A uniproc hammer needs one h-link for io.
        A dually needs two per core: 1 for core to core, 1 for io (though all the io on all the boards I have seen feeds to only one proc's h-link... so that you don't lose PCI busses and such if you have only one proc installed, I suppose).
        Quad and above requires three: each core links to two other cores, leaving one h-link per core for io. One could have a pci-e bus per proc, if one desired. But again, I haven't seen a design that doesn't feed all io into a single h-link.

        Since no one uses the extra h-link anyway, a dual core package for a dual core system would need only one external h-link (saving some cash).

        A quad core, dual package system would require three h-links feeding out of each package, though. But even then, the number of h-links laid out on the mobo is reduced and the whole shebang should be cheaper.

        Intel's "one huge shared bus" + northbridge design is definitely being trampled...
      • by Paul Jakma ( 2677 ) on Monday June 14, 2004 @09:37PM (#9425953) Homepage Journal
        apparently because of reduced bus conflicts with their individual memory spaces.

        Ah but with multi-core chips they can transduce their flux capacitors with the onboard trans-mogrification controllers. Seriously "reduced bus conflicts with their memory space", what does that mean?? That's gibberish.

        P4, presumably, like the P6 GTL+ host bus is a shared bus (like most buses are). Only CPU can use the bus at any one time. If the bus does x GB/s, that's only to one CPU at any given time - effectively it is shared. Further, P6 and P4 do not have integrated memory controllers, and must access RAM via the (shared) GTL+ bus, if it is not in cache. Eg, a 4 CPU machine looks like:

        P = CPU
        MC = Memory Controller (part of the "northbridge" chip, also provides PCI host bus controller, etc.)

        P P P P
        | | | |
        --------- GTL+ bus
        |
        MC--RAM

        Also GTL+ is limited to 4 CPUs and one controller. To get 8 CPUs some controller vendors have invented a GTL+ 'bridge' to stitch 2 GTL+ buses together, but that just makes things worse really from a scaleability POV I'd imagine.

        The K8 on the other hand uses a point-to-point (PtP) serialish, packet based transport, HyperTransport [hypertransport.org] to interconnect CPUs and has onboard memory controller(s) (connected internally via HyperTransport links). A 4 CPU K8 machine looks like:

        K = K8 CPU
        HT = HyperTransport link

        RAM--MC-\ /-MC--RAM
        RAM--MC--K--K--MC--RAM
        | |
        | |
        RAM--MC--K--K--MC--RAM
        RAM--MC-/ \--MC--RAM

        Each of the lines out of a K is a HyperTransport link. Each MC is integrated into the die itself. (you'll have to imagine interconnects and right-hand top/bottom MC's lining up with the K symbols, cause /.'s filter is chomping whitespace in some strange way on me).

        Each CPU has 4 HT links, two to other CPUs, two to its (integrated on die) memory controller. For dual CPU setups, each CPU needs only link to another CPU obviously. Indeed the difference between 2xx, 4xx and 8xx AMD Opteron CPUs is the number of HyperTransport links. Indeed in large multi-CPU (ie 8+) SMP setups one need not attach a memory controller to each CPU, one might choose to have a central "cross-bar" of fully-meshed K8s who then connect to peripheral K8s which have memory controllers and hence RAM. Tis all down to the board designers I guess. And a bit of a fun computer science problem too in terms of designing optimal 'networks' of interconnected nodes with the best compromise of maximum node to node distance for lowest number of required interconnects.

        The K8 is actually a ccNUMA (cache coherent, Non-Uniform Memory Architecture) machine, in SMP configurations. Ie, different memory is at different distances to different CPUs, or to put it another way, some memory is local, other memory is distant, some memory may be more distant than other memory. Eg, for the top-left CPU to access RAM on it's "local" MC is obviously potentially far quicker, in terms of latency, than to access "distant" RAM on another node, and to access memory on an adjacent K8's memory controller will have lower latency than to access memory allocated in the bottom-right CPUs RAM. A good OS aware of the issues can try ensure to keep processes on the CPUs to which that processes memory is "local" and hence maximise performance, but it's quite a juggling act (Linux has some NUMA support).

        What AMD will do for multi-core we dont know. For certain the individual cores will be connected by HyperTransport. Most likely AMD will give each core their own dedicated memory controller, which would simply make a multi-core SMP be exact same in terms of architecture as the current dual K8 architecture (ie 2xx opteron), and hence no different in terms of bandwidth contention than for existing SMP Opterons.

        It will make large SMP machines a lot easier to build though. Eg

        • by Paul Jakma ( 2677 )
          Self-correction: Apparently it might be just _one_ memory controller per die, which may or may not itself be dual channel (I gather from other posts). Also, obviously each CPU potentially has additional HT links to connect to things like PCI bus controllers, AGP controllers, etc. (the basic block diagramme for the Tyan S2885 dual K8 board shows the AMD-8151 AGP controller and the AMD-8131 PCI-X controller wired to CPU0).

    • by mp3LM ( 785954 ) on Monday June 14, 2004 @07:25PM (#9424880) Homepage
      heat

      Yes..the evil of all machines
      the reason why when the AC is not on in my house, and it is 90degrees outside, my computer resets
      and of course..the reason why we're not going quad core


      well..at least that's my personal opinion...as for the real reason...probally for profit...
    • This could ultimately lead to a reformulation of Moore's Law. Thus, I propose k4_pacific's hypothesis:

      The number of processor cores doubles every eighteen months.
      • That's funny Moore's Law says that the number of transistors per area will tend to double every (18 or 24 months depending on which part of Gordon Moore's career you listen to him). More cores per chip with better processes does nothing to stop this progression.
    • by HuguesT ( 84078 )
      The answer is that it doesn't make sense for a desktop machine.

      Windows professional comes with a license for 1-2 CPUs. Above that you need to purchase one of the server edition, and it starts becoming *very* expensive.

      Soon 2 CPUs will be for the masses, they already are with hyperthreading in a way. However 4 and above really are for servers; multi-user environments, etc.

      Also while it is easy to exploit 2 CPUs in a desktop environment (roughly speaking 1 for the O/S, the other for the applications) there
      • Look at BeOS.

        The threading was planned from day one to support multiple processors with out any special coding. It's been a few years, but I think i'm right.

        If Microsoft is smart, they'll implement something like this for Longhorn and whatever binary executables are used.
    • by NerveGas ( 168686 )

      Because the overall size of the die is a tremendous factor in the cost of a processer. Because of that, die sizes tend to stay relatively constant over the years.

      As manufacturers are able to squeeze the transistors in more tightly, then you see more circuitry appearing. As they move to 90-nanometer production, they're going to be able to pack on more transistors, and using dual cores becomes an economic possibility. However, throwing FOUR cores on would make the die large enough to be an economic di
  • by MarkWPiper ( 604760 ) on Monday June 14, 2004 @07:21PM (#9424836) Homepage
    linky linky! [anandtech.com]
  • by ruiner5000 ( 241452 ) on Monday June 14, 2004 @07:23PM (#9424852) Homepage
    you can find them all here. [amdzone.com] It seems news has gotten around, and that AMD's dual core will consume just about as much power as a single core CPU at 90nm.
  • by ackthpt ( 218170 ) on Monday June 14, 2004 @07:24PM (#9424860) Homepage Journal
    As the number of pins continues to increase the mass does also, at some point processors will achieve such a large mass they will collapse in upon themselves.

    actually it'll probably be more like the processors gets so big that you just clip things onto the outside of it and it takes the place of the motherboard.

  • by schwep ( 173358 ) on Monday June 14, 2004 @07:24PM (#9424864)
    I have seen some licensing schemes that apply to per-processor costs... 1 CPU = $1,000, 2 CPU = $2,000 etc.

    How long will it take to argue that consumers with a dual core processor should pay 2x the price? I'm betting not long.
  • by bugnuts ( 94678 ) on Monday June 14, 2004 @07:26PM (#9424892) Journal
    They're making the first Desktop Fusion Unit!
  • Why would you wait a year?

    http://www.theregister.co.uk/2004/06/01/amd_939/

  • A year? (Score:5, Funny)

    by aardvarkjoe ( 156801 ) on Monday June 14, 2004 @07:29PM (#9424918)
    "Will this be socket 939 or should I try to hold out another year to buy?"

    You're planning on waiting more than a full year between computer upgrades? Are you sure you're on the right website?
  • by cyfer2000 ( 548592 ) on Monday June 14, 2004 @07:29PM (#9424919) Journal
    I could see a big future of heatsink business in Intel and AMD's plans.
  • by filledwithloathing ( 635304 ) on Monday June 14, 2004 @07:30PM (#9424927) Homepage Journal
    Will this be socket 939 or should I try to hold out another year to buy?"

    You'll need a new motherboard.

    The DDR memory interface appears to wrap around both L2 caches, meaning that it looks like both cores have their own 128-bit memory interface; whether or not both memory controllers will be enabled is another thing, but if this is true we have a number of implications to talk about. If dual core Opterons do indeed have two memory controllers, the pincount of dual core Opterons will go up significantly - it will also make them incompatible with current sockets. AMD is all about maintaining socket compatibility so it is quite possible that they could only leave half of the memory controllers enabled, in order to offer Socket-940 dual core Opterons. AMD isn't being very specific in terms of implementation details, but these are just some of the options.

    • Read up on the Opteron die layout.

      This is NOT two ENTIRE Opteron processors plunked on the same die.

      AMD have designed the ability to connect to TWO cores into the SysReq part of the processor since the beginning.

      The SysReq connects on the other side to a crossbar that connects in turn to the HyperTransport Controller and the Memory Controller.

      A dual core processor will still only have a 128-bit memory controller.

      AMD have stated that the processors will be socket compatible. This also suggests that S939
    • by MBCook ( 132727 ) <foobarsoft@foobarsoft.com> on Monday June 14, 2004 @08:00PM (#9425169) Homepage
      Nope. Sorry.

      I understand your reasoning, but according to this article [infoworld.com] (I found the link on Ace's Hardware [aceshardware.com]) the dual core chips will be compatible with current motherboards and sockets with as little as a BIOS flash (to recognise the new CPUID I assume). The downside of this is that the two cores will SHARE the dual channel memory bus. But because the bus is so effiencent, each core will probably STILL get more bandwidth than most P4s. At worst it shouldn't be much worse than having two single channel Athlon64s (which also are often faster than the P4). I think this is FANTASTIC news. For one thing, this means you could put FOUR CORES in that dual opteron SFF PC that was revealed a short while ago.

      Really, it only makes sense. A dual channel processor has 939 pins, a single channel has 754 pins. So while some are power, you're looking at about 190 pins for the second memory channel. So that would mean that to have two cores on one die with their own memory channels, you'd need 1120 pins or so. That's a LOT of pins.

      Instead of that enginering nightmare (you'd probably need 7 layer mobos to support that), we get drop in replacements that meet the same thermal requirements. Just think. You're dual operton not cutting the mustard any more? Buy two processors, drop 'em in, flash the BIOS, and now you've got FOUR processors without a new mobo or anything. All you'd have to worry about then is software licenses (unless of course you don't use any software that requirs that, for example you're all open source).

      So to answer the grandparent's question, I'd say buy now. That said, I'm not sure if socket 939 will get dual cores or if it's only for 940s. I assume 939 will get them too.

      Speculation: I'd like to know if the dual channel memory controler is shared by the two cores (like some kind of cross-bar architecture thing like nVidia used to promote) or if each core got exclusive access to one of the two channels. My guess is the former.

      More speculation: Will there be a socket 754 dual core? That'd be cool, and I don't think the performance would be too much of a problem memory wise, unless you were doing memory intensive tasks. For CPU bound tasks I think you'd be fine.

    • No you won't. Infoworld got it right. Anand should have researched before he put up his story.

      AMD's dual-core server processors will share a single memory controller, Weber said. This won't create a bottleneck because a server with two Opteron chips, and therefore two memory controllers, already has more than enough memory bandwidth required to run that system, he said.

      "It's always a juggling act to add a little more processing and a little more memory. Right now, we have plenty of memory and I/O bandw
  • by rewt66 ( 738525 ) on Monday June 14, 2004 @07:30PM (#9424937)
    is dilithium cores!
  • by polyp2000 ( 444682 ) on Monday June 14, 2004 @07:34PM (#9424963) Homepage Journal
    To be perfectly honest, it depends how rich you are. At the end of the day when it comes to buy now, buy later; the state of technology generally speaking is that in most cases (particularly with computer hardware) after only a short period of time , whatever technology you invest in becomes obsolete.

    From my own personal point of view, my dual athlon 1.5ghz is still holding out beautifully. When the cash comes my way Im banking on a powerbook. Truth is I dont need another desktop just yet. However if i had a stupid disposable income, and one that predictably would hold out till these dual cores come out id proabably get one now, and get one later.

    When I built this machine I bought the highest spec parts I could afford at the time and I havent upgraded for 2 or 3 years aside from upgrading the graphics card. The rule I live by is get the best available that you can afford at the time and it should keep you going for a good while.

    Im running gentoo box; faster processors would be very nice for source compiles but I gave up on churning out seti blocks a while ago and dont have a massive reason for further processor power ...
    • The rule I live by is get the best available that you can afford at the time and it should keep you going for a good while.

      Which is perhaps the most expensive way to get what you need.

      I take a look at pricewatch [pricewatch.com] under "hard drives", here's the matrix:

      CAPACITY PRICE PRICE/CAP

      300GB $232 $0.77
      250GB $158 $0.63
      200GB $101 $0.51
      180GB $100 $0.56
      160GB $77 $0.48
      120GB $58 $0.48
      100GB $58 $0.58
      080GB $48 $0.60

      Notice that the price starts at a high of 77 cents per GB, then falls almost 40% in price per unit down to

  • by Vario ( 120611 ) on Monday June 14, 2004 @07:36PM (#9424987)
    Dual cores processors seem to me like a pretty good alternative to a dual processor system. You don't have the hassle of 2 huge coolers blowing out hot air, the mainboards are don't have to be overpriced and it is already supported by all OS.

    Some years ago I was thinking about getting a dual processor system. Alone the motherboard was two times as expensive as a similar single processor one, applications did not support it all and so on. I hope newer applications are ready for dual cores. Quake III was the first game I know that used two processors and finally I can consider that animated desktop background.

    Is there a list which applications can effectively use dual cores besides obvious things like webservers?
    • by rebelcool ( 247749 ) on Monday June 14, 2004 @07:41PM (#9425029)
      Anything multithreaded. Which is just about any modern GUI app.
      • Only if the application is doing time consuming stuff in at least two threads. You say any modern GUI app, so is Firefox rendering a page multithreaded? What about my DVD Player Software, Games, TeX, Maple?
        • by NerveGas ( 168686 ) on Monday June 14, 2004 @08:45PM (#9425581)

          Multithreaded and multi-process.

          If Firefox is rendering a page, you've got Firefox doing the rendering, the GUI working with video drivers, disk drivers looking at/updating your browser's cache, kernel code managing disk cache, kernel code managing network activity, and perhaps even firewall code running.

          Whether you use Linux or Windows, there are a LOT of things running that you don't see in normal process list.

          Now, will dual CPU's speed up that render time in Firefox? Not to any significant amount. But having used a LOT of dual-CPU systems, I can say that under heavy load, the machine will be much more responsive. If that helps your workload, it might be worth it. If it doesn't, it's not worth it.

          As an example, at work I have a dual AthlonMP 1800+. At home, I have a single AthlonXP 3200+. For what I do at work, the single-proc chip would suck rocks. For what I do at home, the 1800+ would not compare to the 3200+. It's all about your usage.

          steve
      • You also get benefits in multitasking. Sure your PC might be able to burn a CD, rip a DVD, play some MP3s, and run a ton of web browser windows now, but with two processors, things really seem smoother.

        You'd notice the most difference if you had one CPU bound app and a ton of others that weren't. For example you were running some big simulation or POVray, and at the same time checking your e-mail and surfing the web. With two processors even if the prorams don't use them (they aren't SMP aware), as long as

  • Wait, socket 939 is real!? I thought the concept of a 939-pin CPU was some sort of hyperbolic joke!
    • Socket 939 is Socket 940 minus one pin.

      They are identicle in features (dual channel) except the 939 lacks one pin. That pin just happens (wink) to be a HyperTransport link that was removed. This means there are not enough links to support multi-processor setups because you can't have the links to the other processors because you don't have enough. This is basically a marketing move to segment the workstation market from the desktop market. There is no (techincial) reason they couldn't have used socket 940

  • by HotNeedleOfInquiry ( 598897 ) on Monday June 14, 2004 @07:45PM (#9425052)
    Now I'll have to pay SCO $1,149 instead of $699.

    Yeah, right
  • by foidulus ( 743482 ) * on Monday June 14, 2004 @07:47PM (#9425070)
    Just when I thought I had saved up enough money between upgrades to splurge on those fancy ramen noodles, you know, the one with the dried peas, this comes along.
    Hey, Wal-Mart brand noodles are only 8 cents!
  • Longhorn (Score:5, Funny)

    by colonslashslash ( 762464 ) on Monday June 14, 2004 @07:54PM (#9425118) Homepage
    "We recently learned of Intel's plans to go dual-core in late 2005. Well it seems AMD has decided to follow suit."

    Its amusing to watch the chip manufacturers scramble desperately to meet the recommend specifications for Longhorn in time.

    Oh, c'mon don't look at me like that. A slashdot story without some kind of Microsoft snipe just wouldn't be the same now, would it?

    Alright, fine. I'll pick on SCO or AdTi next time. Sheesh. /me crawls back under his rock

    • what is even more amusing is that AMD began this work well before longhorn was announced. Even more amusing is that AMD announced before Intel. You know, it isn't funny when it isn't even factually correct enough to be so.
  • by philipgar ( 595691 ) <pcg2&lehigh,edu> on Monday June 14, 2004 @08:03PM (#9425188) Homepage
    While the idea of dual core cpus is really cool, and will take over shortly due in part to the fact that we need something to do with all those extra transistors, I wonder why the focus of the industry is on chip multi-processors (CMP).

    While CMP processors can give us rougly the same performance of a standard SMP system (somewhat faster due to interprocessor communication and shared memory, but also slower due to a larger memory bottleneck) I don't think that a CMP system would compete with a simultaneous multi-threading (SMT) solution.

    While Intel's response to SMT (hyperthreading) has some benifits the performance of it is rather lackluster. The reason has more to do with their particular implementation. If you've read about the initial observations on SMT an 8-way SMT processor was shown to outperform a 4-way CMP processor. Now, I must note that the 8-way smt processor had more functional units then the cores in the 4-way CMP processor, but the overall area of the 8-way SMT processor would be much much smaller (far less structures need to be duplicated for SMT as opposed to CMP). For more information on this check out some of the papers at http://www.cs.washington.edu/research/smt/ .

    What I don't understand is the insistance of the industry to use CMP first. From everything I've read, an 8-way SMT processor should take up less die space then a two way CMP processor. Even assuming that the 8 way processor contains more functional units. It kind of makes sense that a CMP processor is faster when there aren't enough threads to fully utilize a SMT processor (say only 2 or 3 threads that want full cpu usage). I guess SMT is a big chance in the model of programming and application development (I'm currently running research on the subject which is why I'm so interested in it). Is the reason to embrace CMPs simply because there's less new technology to add (they "just" have to interconnect two cores as opposed to adding the extra logic for SMT).

    Does anyone else have any other opinions regarding this matter, or any idea why no one seems to be fully embracing SMT's potential.

    Philip Garcia
    • by WeekendKruzr ( 562383 ) on Monday June 14, 2004 @08:35PM (#9425497)
      SMT is only needed if your execution units are having trouble remaining filled up, which was the problem with the NetBurst architecture due to the huge hits that it takes with a branch mis-prediction penalty. When a mis-predict happens the execution unit has to sit idling away and wait for the proper info to go be re-fetched. With SMT, the unit simply switches over to one of the other threads waiting in the wings which keeps the processor doing useful work instead of wasting cycles. This is why the software has to be re-written to take advantage of it so that the processor knows which threads to give priority to.

      Intel stuck SMT into the Pentium in order to balance out the some of the negative effects the go hand-in-hand with a processor that has a LONG pipeline. AMD has a much shorter pipeline (especially when compared to the new Prescott) and therefore they don't suffer much of a penalty when a mis-predict happens. Also, if I remember correctly the Athlon was already known being extremely efficient in terms of resource allocation within the processor since AMD can't afford to just dump tons of extra cache onto the chip.

      Both of these things taken together means that using up extra real estate on the die of the Athlon in order to get SMT isn't really worth it in terms of the performance it would bring. Even on the Pentium the benefits aren't all that hot and it's only in specific types of code that you see any impresive speed gains.

    • You really can't figure out why they're focusing on CMP? It's not exactly tough. They don't have to design a new architecture. That saves a LOT of money in R&D, and (more importantly) cuts a LOT of time off of time-to-market. It's also VERY easy - especially with the Opterons. Copy the lithograph, connect the HT links, and you're done. To top it all off, it's something that will fetch a good price premium

      To summarize, it's easy, fast, and will (supposedly) make them more money. That's a lot
  • by ShatteredDream ( 636520 ) on Monday June 14, 2004 @08:09PM (#9425227) Homepage
    I will finally be able to run Linux in VMWare with a VMWare instance running Windows98 running Bochs running BeOS emulating OSX with PearPC. Thank you AMD, you have guaranteed me alpha male status in the CS department for a semester.
  • The architecture as I understand it also creates the ability to moderate CPU temperature by switching between cores as the temperature rises too much. So that both cores can be running flat out if you have great heatsink, but if the levels get to hot through insufficient heat dissapation or heavy CPU usage then it is possible to switch a core 'off'. Of course all this is controlled by the MB and CPU, leaving no opportunity for errors by the users.
  • by Nom du Keyboard ( 633989 ) on Monday June 14, 2004 @08:23PM (#9425375)
    Why not take an older processor (e.g. i80486) that already is basically single cycle execution -- or Pentium which has two execution pipes already -- update it to modern geometry which should increase speed and decrease power, and put as many as you can easily fit onto the die? After all, those older cores execute all the basic i86 code including MMX with a lot less transistors. How much does SSE, SSE2 and HT contribute verses a lot of cores just executing threads with little context switching?
  • by HuguesT ( 84078 ) on Monday June 14, 2004 @08:32PM (#9425454)
    This raises questions regarding stability and Windows.

    While I find that multiprocs settings under Linux improve things to a significant degree (although there are still outstanding issuess with NVidia proprietary drivers and SMP), I found the opposite true for Windows.

    The last time I tried, which was about 2-3 years ago, many drivers didn't seem to expect true concurency under Win2k and I was experiencing significantly more crashes on my dual P-III than when I forced the system to only use one of the CPUs. Yet it probably wasn't the hardware because that same machine was very stable with Linux.

    With the advence of hyper-threading, have things improved markedly with WinXP?
  • by chrysrobyn ( 106763 ) on Monday June 14, 2004 @09:26PM (#9425862)

    I see lots of conversation comparing this generation of processor to space heaters, wisecracks about Longhorn minimum systems (that actual article was about the predicted "average [slashdot.org]", not minimum). Not much about actual multi-cores. They're an interesting direction to go.

    The current direction of single core CPUs is basically running into the most they can do with XUs, MPUs, caches, etc. Sure, you can decrease the pipeline depth below the 18FO4 that the PentiumIV supposedly has, and that can help you with serial data paths, and that makes simple XUs, MPUs, etc. faster, but the branch mispredict is still horrendous -- perhaps too high for a general purpose processor found in our PCs. The more complicated logic is possible to do, but there's only so much you can do with the data and sub-Angstrom logic.

    Beyond the geek factor, multiple cores on a single die attack the same problems as putting SMP did in the first place (plus a few race conditions that otherwise may have been very rare), allowing much less manpower to design a processor that is still much faster in the end. A single threaded application will seem slower, and that will place more burden on the developers to see the light of multiple threads. Instead of allowing an XU to munge through and deal with a single thread at a time, which may be a misuse of incredible resource (like a thread that said "go to grocery store" and the XU was a race car), multiple die have correspondingly multiple XUs each with their own resources, so hard tasks can be spread across multiple cores, or simple ones can get executed in parallel with others (like a thread can take a Kia to the grocery store while another Kia goes to the Post Office). Of course, problems that cannot be divided into multiple threads do not see the advantage of multiple cores, but other tasks remain responsive without requiring a monster task to context switch.

    I've read about multiple cores that share a single L2 outperforming multiple cores with dedicated L2s in specific tasks, basically one core essentially acts like a pre-fetch core under a workload and the second core can reap the benefits.

The Tao is like a glob pattern: used but never used up. It is like the extern void: filled with infinite possibilities.

Working...