×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Intel's 128MB L4 Cache May Be Coming To Broadwell and Other Future CPUs

timothy posted about 5 months ago | from the now-read-some-old-prices-and-get-offa-my-lawn dept.

Intel 110

MojoKid writes "When Intel debuted Haswell this year, it launched its first mobile processor with a massive 128MB L4 cache. Dubbed "Crystal Well," this on-package (not on-die) pool of memory wasn't just a graphics frame buffer, but a giant pool of RAM for the entire core to utilize. The performance impact from doing so is significant, though the Haswell processors that utilize the L4 cache don't appear to account for very much of Intel's total CPU volume. Right now, the L4 cache pool is only available on mobile parts, but that could change next year. Apparently Broadwell-K will change that. The 14nm desktop chips aren't due until the tail end of next year but we should see a desktop refresh in the spring with a second-generation Haswell part. Still, it's a sign that Intel intends to integrate the large L4 as standard on a wider range of parts. Using EDRAM instead of SRAM allows Intel's architecture to dedicate just one transistor per cell instead of the 6T configurations commonly used for L1 or L2 cache. That means the memory isn't quite as fast but it saves an enormous amount of die space. At 1.6GHz, L4 latencies are 50-60ns which is significantly higher than the L3 but just half the speed of main memory."

cancel ×
This is a preview of your comment

No Comment Title Entered

Anonymous Coward 1 minute ago

No Comment Entered

110 comments

Look at my buttnudeness! (-1)

Anonymous Coward | about 5 months ago | (#45500077)

Yes, it is true! I am the absolute 100% bootybuttsnapassbayernude naked extremist of the ages! Such a fuckin' thing, Slashdot!

Re:Look at my buttnudeness! (0)

Anonymous Coward | about 5 months ago | (#45500495)

You're weird, man. One of the sillier trolls on here. I wonder why you do it and what you're like IRL.

Re:Look at my buttnudeness! (0)

Anonymous Coward | about 5 months ago | (#45500521)

I don't know,man. I think it's the same guy with the fetid rectum. I'd stay away.

So in the real world? (0)

SimonTheSoundMan (1012395) | about 5 months ago | (#45500079)

I have a Retina MacBook Pro with this Crystal Well processor. What advantages does it really bring?

Unsure of any real world benchmarks compared to standard Haswell processors.

Re:So in the real world? (4, Informative)

SimonTheSoundMan (1012395) | about 5 months ago | (#45500111)

The only benchmarks I have found is from SiSoftware. http://www.sisoftware.co.uk/?d=qa&f=mem_hsw [sisoftware.co.uk]

But how is this going to effect Firefox, Photoshop, or video conversion?

Does it have an effect on battery life?

Re:So in the real world? (3, Interesting)

K. S. Kyosuke (729550) | about 5 months ago | (#45500145)

On laptops? Perhaps it could, I suspect that an eDRAM cache+slower main memory could have lower total power consumption at the same performance level than a faster main memory, especially if you have more of it. I believe that the major power usage component for main memory DRAMs is actually using the memory (as in, transferring the data).

Re: So in the real world (0)

Anonymous Coward | about 5 months ago | (#45503421)

Two points, overclocking memory sometimes requires increasing the voltage but store-bought faster memory most of the time uses no extra power, it could even use less because current needs to flow for less time. And, my own testing with cpu's having different amounts of L2 cache suggests that idle consumption increases with cache size. Speed also increases but probably only with small cache sizes (going from 256kb to 512kb L2 gave 3% quicker compiles with GCC, with modern cache sizes the difference would be negligible).

So I don't see where this potential improvement will be realized because compiling is surely a heavy user of cpu and memory.

Re:So in the real world? (5, Informative)

muridae (966931) | about 5 months ago | (#45500301)

Photoshop? Considering that the adobe rgb or other color spaces combined with the file sizes of some of the larger images coming out of cameras, your gains in latency would really depend on Photoshop and the OS being able to handle the L4 cache and keep the right part of the image in the cache. Video editing, with file sizes into the gigabyte range would probably see no gains at all. Video conversion, with a program that keeps a reasonably sized buffer, should see a good performance gain; but it would require code that knows the L4 is available or the OS to predict that L4 is a good place to put a 10-50-100MB buffer. The real gain will be in common things: playing a video, browsing the web (seen how much memory a bit of javascript or the JRE can eat up lately? Or Silverlight/Flash?) and email clients (cache all your email in L4 for faster searching).

As for battery life, I have no idea. It might use more power, since DRAM requires constant power to refresh data where SRAM is pretty stable; but the lower leakage of using a single transistor instead of 6 might prove to be a benefit. It would take a good bit of time and some pretty good test code to figure the difference, I suspect.

Re:So in the real world? (2)

neokushan (932374) | about 5 months ago | (#45500483)

I'm not an expert by a long shot, but I'm pretty sure that modern day applications don't go anywhere near that low a level and instead leave memory management up to the system.

Re:So in the real world? (2)

fa2k (881632) | about 5 months ago | (#45500679)

Even if the whole files take up more than the cache, the filters and algorithms running on them may need to access only a part of the image/video (e.g. a access a frame of the video multiple times). The benefit of caching is highly dependent on the algorithms used

Re:So in the real world? (2)

Bengie (1121981) | about 5 months ago | (#45501101)

CPUs have had streaming instructions for a long time that can tell the CPU to load data directly from main memory to L1 cache and not use L2/L3/L# cache at all. This reduce cache eviction for data that is transient.

Re:So in the real world? (5, Informative)

fuzzyfuzzyfungus (1223518) | about 5 months ago | (#45500187)

At least as marketed, the main advantage is allowing the GPU some RAM that isn't DDR3 stolen from the main system a couple of hops away (which has traditionally been one of the things that make integrated graphics really suck, and cheap discrete parts that use DDR instead of GDDR, and/or an excessively narrow or slow memory bus kind of suck).

Given that even intel's marketing optimists don't say much about CPU performance (and also given that it's a mobile-only feature, you can't even buy an non-BGA part expensive enough to have it, which would be unusual if it actually improved CPU performance enough to get enthusiasts worked up; but is downright sensible if the target market is laptops sufficiently size/power constrained not to have discrete GPUs; but where pure shared memory was dragging GPU performance down.)

Re:So in the real world? (1)

Bengie (1121981) | about 5 months ago | (#45501111)

128MB of L4 cache and Transactional Memory instructions will make it great for routers.

Re:So in the real world? (1)

SimonTheSoundMan (1012395) | about 5 months ago | (#45501553)

Interesting that the gigabit Ethernet controller on the latest Apple Mac's have 512MiB of DDR3. Any idea what this is for?

Re:So in the real world? (2)

Bengie (1121981) | about 5 months ago | (#45501589)

No idea. I was under the impression that most NICs have just enough onboard memory to buffer potential bursts of data, but otherwise write to system memory via DMA and interrupt the CPU to notify it when the data is ready. 512MB sounds like a lot of buffer for just a 1gb NIC.

Re:So in the real world? (2)

thejynxed (831517) | about 5 months ago | (#45503479)

Actually, that amount on a NIC would be a great boon in keeping all network processing on the NIC instead of having to CPU/system memory-offload, especially when you turn on the bells and whistles like jumbo frames, etc. I can also see it helping out quite a bit when processing HD video packets when streaming video where it's pretty important to get them processed as quickly and efficiently as possible before passing them off to the main system. These packets tend to have a decent amount of overhead, etc and being able to process quite a bit of them at once due to increased RAM on the NIC should help quite a bit smoothing out the entire process.

Re:So in the real world? (5, Interesting)

SuricouRaven (1897204) | about 5 months ago | (#45500429)

Cache performance impact is very heavily dependant upon application characteristics. Specifically, active memory.

Best case, when you're working with an active set that's larger than L3 but under L4 - around 100MB or so - and you're accessing it on a repeating pattern, and the compiler hasn't found any tweaks to help, and you're not multitasking, and the OS isn't swapping you out every slice, and the stars are aligned in your favor... the theoretical maximum performance gain can be up to 2x. It's very rare you'll find a program that benefits that much, though. Closest I can think of is image processing.

So in the real world, anywhere from 'no benefit' to 'double the speed' depending on application.

Re:So in the real world? (2)

K. S. Kyosuke (729550) | about 5 months ago | (#45502697)

Closest I can think of is image processing.

What you've quoted sounds more like a case for random accesses. Trees, graphs (!), and other complicated data structures, I'd guess. I believe that image processing can take care of itself most of the time by simple prefetching.

Re:So in the real world? (0)

Anonymous Coward | about 5 months ago | (#45500795)

Performance of scientific calculations / core frequency is improved if Internet is to believed. Since Haswell got widened I wonder if the core is so much more memory constrained than the cache really lets the cores "fly".

Re:So in the real world? (0)

Anonymous Coward | about 5 months ago | (#45502721)

It will have no impact on your Retina MacBook Pro, because Apple made damn sure you cannot upgrade it. You cannot even replace the battery.

first post (0)

Anonymous Coward | about 5 months ago | (#45500085)

because of the 128mb cache, i was fast enough to get 1st post!

Re:first post (0)

Anonymous Coward | about 5 months ago | (#45500297)

This is what happens when The First Post Game goes wrong!

If you screw with Broadwell (1)

smitty_one_each (243267) | about 5 months ago | (#45500089)

. . .that Broadwell broad, well, is a broad well into which you could throw your entire career.
Just say no, David.

Re:If you screw with Broadwell (0)

Anonymous Coward | about 5 months ago | (#45501437)

>is a broad well into which you could throw your entire career

I am throwing my career into it you insensitive clod!

"half the speed of main memory"? (3, Insightful)

Anonymous Coward | about 5 months ago | (#45500093)

"At 1.6GHz, L4 latencies are 50-60ns which is significantly higher than the L3 but just half the speed of main memory."

WTF? The correct would be, I think, half the latency of main memory...

Re:"half the speed of main memory"? (1)

Anonymous Coward | about 5 months ago | (#45500139)

Or double the speed.

Re:"half the speed of main memory"? (1)

SuricouRaven (1897204) | about 5 months ago | (#45501103)

It's also pretty poor, for a cache. That's why it's an L4 cache, rather than replacing the L3 or L2.

Re:"half the speed of main memory"? (0)

Anonymous Coward | about 5 months ago | (#45501265)

Speed isn't the same thing as latency. That won't change no matter how many people are confused between the two.

Half the speed of main memory? Why Bother? (2, Informative)

GiantRobotMonster (1159813) | about 5 months ago | (#45500105)

At 1.6GHz, L4 latencies are 50-60ns which is significantly higher than the L3 but just half the speed of main memory.

Hmmm. L4 cache runs at half the speed of main memory? That doesn't seem right Why bother reading these summaries? The people posting them certainly don't

Re:Half the speed of main memory? Why Bother? (1)

Anonymous Coward | about 5 months ago | (#45500211)

Umm, they are clearly using latency as their measure of speed, so yes, 'half the speed of main memory" does seem right. Sure it's not worded as well as it could be, but you should be able to understand and not bitch about it.

Re:Half the speed of main memory? Why Bother? (1)

Anonymous Coward | about 5 months ago | (#45500223)

Technical articles should be written carefully.

Re:Half the speed of main memory? Why Bother? (1)

GiantRobotMonster (1159813) | about 5 months ago | (#45500225)

Oh please! It should be twice the speed if it has half the latency, not half the speed. Speed and latency are related, but not interchangeable synonyms!
If a cache has "half the speed" of your uncached memory, you need to disable that cache ASAP!

Re:Half the speed of main memory? Why Bother? (1)

Bengie (1121981) | about 5 months ago | (#45501171)

It is actually 1/2 the latency and 2x the bandwidth. Some benchmarks have shown the L4 getting 42GB/s while current and prior gen CPUs were getting about 17GB/s out to main memory.

Re: Half the speed of main memory? Why Bother? (0)

Anonymous Coward | about 5 months ago | (#45500417)

50-60ns is an incredibly high latency. DDR3 is significantly faster than that.

Re: Half the speed of main memory? Why Bother? (1)

0123456 (636235) | about 5 months ago | (#45501371)

DDR3 is significantly faster than that.

I can't find any actual measurements with a quick web search, but I seem to remember that core to DDR3 latency is in the order of 80-100ns.

Re: Half the speed of main memory? Why Bother? (2)

m.dillon (147925) | about 5 months ago | (#45502355)

DDR3 (dram in general) can burst quickly but full random accesses are very slow in comparison.

-Matt

Re:Half the speed of main memory? Why Bother? (1)

Sponge Bath (413667) | about 5 months ago | (#45500625)

Why bother reading these summaries?

It's a puzzle. Is the summary wrong because of stupidity, or is it crafted that way for click bait?

Seems Intel is getting desperate.... (-1)

gweihir (88907) | about 5 months ago | (#45500131)

I can only attribute this making sense to a really bad memory interface.

Re:Seems Intel is getting desperate.... (0, Insightful)

Anonymous Coward | about 5 months ago | (#45500353)

I can only attribute this making sense to a really bad memory interface.

a cpu with a 1.6 GHZ clock will definitely have high memory latency. so how exactly does that point to a bad memory interface? I also assume since its a mobile part, the memory probably isn't super fast and low latency to begin with. you sound like an AMD FANBOY to me. intel has no need at all to be desperate, they could take a year off and still be in the lead

Re:Seems Intel is getting desperate.... (0)

gweihir (88907) | about 5 months ago | (#45503541)

Actually it is the other way round: The slower the CPU, the faster the memory interface in comparison and the less need for caches. What Intel does here makes no sense unless they are covering up an architectural problem. Memory clock is _not_ tied to CPU clock in a sane architecture. Otherwise you would need to buy memory by CPU clock. You do not need to to that for Intel or AMD.

Also, desperation engineering-wise has not necessarily any connection to business-desperation. And remember that Intel messed up very badly before, just look at the Pentium IV that was born out of deep engineering desperation, was slow, excessively power-hungry and could really only be scrapped later on. But I guess thought on this level is beyond you.

Why only 128 MB? (1)

Anonymous Coward | about 5 months ago | (#45500171)

Broadwell represents a miniaturization step from 22 to 14 nm structures. Why do they keep the capacity of the Crystalwell L4 cache at 128 MB? They could put twice that memory onto a die with the same area as the 22 nm Crystalwell version. Is the Crystalwell die for the Haswell CPUs so large and expensive that they have to reduce its size?

It's not 14nm, that's why (1)

dutchwhizzman (817898) | about 5 months ago | (#45500451)

It's in the same package, but not made in the same silicon or process. The package contains several pieces of silicon. Look at it as a miniature circuit board with several individual chips on it.

Re:Why only 128 MB? (5, Informative)

Kjella (173770) | about 5 months ago | (#45500591)

Broadwell represents a miniaturization step from 22 to 14 nm structures. Why do they keep the capacity of the Crystalwell L4 cache at 128 MB? They could put twice that memory onto a die with the same area as the 22 nm Crystalwell version. Is the Crystalwell die for the Haswell CPUs so large and expensive that they have to reduce its size?

From Anandtech's article on Crystalwell [anandtech.com]:

There's only a single size of eDRAM offered this generation: 128MB. Since it's a cache and not a buffer (and a giant one at that), Intel found that hit rate rarely dropped below 95%. It turns out that for current workloads, Intel didn't see much benefit beyond a 32MB eDRAM however it wanted the design to be future proof. Intel doubled the size to deal with any increases in game complexity, and doubled it again just to be sure. I believe the exact wording Intel's Tom Piazza used during his explanation of why 128MB was "go big or go home". It's very rare that we see Intel be so liberal with die area, which makes me think this 128MB design is going to stick around for a while.

I get the impression that the plan might be to keep the eDRAM on a n-1 process going forward. When Intel moves to 14nm with Broadwell, it's entirely possible that Crystalwell will remain at 22nm. Doing so would help Intel put older fabs to use, especially if there's no need for a near term increase in eDRAM size. I asked about the potential to integrate eDRAM on-die, but was told that it's far too early for that discussion. Given the size of the 128MB eDRAM on 22nm (~84mm^2), I can understand why. Intel did float an interesting idea by me though. In the future it could integrate 16 - 32MB of eDRAM on-die for specific use cases (e.g. storing the frame buffer).

intel bloHard next generation CHIP revealed (0)

deysOfBits (2198798) | about 5 months ago | (#45500173)

I know for a fact that the next generation BloHard chip from Intel will have a nuclear reactor in the chip itself It will glow in dark and you will be able to steam your vegetables with it What more can you ask for?

Win95? (3, Interesting)

mwvdlee (775178) | about 5 months ago | (#45500205)

With this 128MB cache, shouldn't this CPU be able to run an OS like Win95 of an older Linux without additional memory?

Re:Win95? (2)

jones_supa (887896) | about 5 months ago | (#45500231)

Not necessarily. It's not just a shadow copy of RAM, but some kind of multipurpose pool. We don't exactly know what the CPU does with it.

Re:Win95? (0)

Anonymous Coward | about 5 months ago | (#45500289)

That's right, there is no way to know exactly what goes on inside the cache.

The Crystal Well is magic!

Re:Win95? (2)

muridae (966931) | about 5 months ago | (#45500313)

If you are writing the OS and your code is down at the machine level, you do know what's going on in the different cache pools. You can abstract it away and trust your compiler to get it right, or you can fiddle the bits yourself; it isn't magic contained in the blue smoke of ICs.

Re:Win95? (0)

Anonymous Coward | about 5 months ago | (#45500405)

ok you do all that so an operating system can run off L4 memory... lemme know when you find a consumer montherboard that posts without actual system memory installed... o wait, you can't find one? well then you'll have to code your own bios or uefi firmware, too.

Re:Win95? (3, Interesting)

Dr. Spork (142693) | about 5 months ago | (#45501131)

You're right that motherboards won't post without memory sticks, but I don't see a good technical reason about why that should be. UEFI could be written so that it posts by using only the resources of the processor and its cache, if it detects no usable memory. I mean, never mind 128MB of L4. Even the 6MB of L3 that modern processors have is larger than the entire system memory of our parents' first computers. It should be more than enough to run something as simple as UEFI.

It would also be rather useful. Instead of issuing you beeps as it fails to boot, a motherboard with a correctly written UEFI implementation could post without working ram and run diagnostics on exactly which systems are working and which are not, and what exactly is going wrong. I really think this would increase everyone's system-building confidence and give the manufacturers who make it happen a leg up in the market.

Classic game consoles as a comparison point (2)

tepples (727027) | about 5 months ago | (#45501331)

Even the 6MB of L3 that modern processors have is larger than the entire system memory of our parents' first computers.

A 6 MB L3 cache is bigger than the RAM in the PlayStation, Nintendo 64, or Nintendo DS. A 128 MB L4 cache would surpass the RAM in a PlayStation 2 and an original Xbox combined. You don't need a lot of DDR [wikipedia.org] to play DDR [wikipedia.org], even if you live in the former DDR [wikipedia.org].

Re:Win95? (1)

SimonTheSoundMan (1012395) | about 5 months ago | (#45501681)

I'm an Apple user and have only ever really looked in to UFI on a Mac, UFI has its own memory space, current Macs have 512MiB of DDR3 separate to the main memory. Also have 512MiB of DDR3 for networking too.

Re:Win95? (0)

Anonymous Coward | about 5 months ago | (#45500883)

It's at interface level, not machine level. And Intel controls the proprietary interface, the API doesn't include ALL strings/functions to be sure.

Re:Win95? (1)

Anonymous Coward | about 5 months ago | (#45500335)

Maybe they will get it right next time with the L5 cache.

L4 used to be main RAM (1)

tepples (727027) | about 5 months ago | (#45501391)

Formerly, L4 cache was main memory, a cache for the L5 (disk) and L6 (network). This new L4 cache pushes main memory, disk, and network out to L5, L6, and L7 respectively.

Re:Win95? (0)

Anonymous Coward | about 5 months ago | (#45500729)

It does what Intel wants it to - like microcode processor updates or hidden-to-user BIOS replacement, whatever they want.

Re:Win95? (0)

Anonymous Coward | about 5 months ago | (#45501275)

eDRAM It is transparent to the OS and is primarily used in high bandwidth graphics flows. It can be turned off and on to meet a power budget when graphics bandwidth is not needed. The L4 cache (eDRAM) is used as a victim cache (so software doesn't need to know about it). When a cacheline gets evicted from L3 cache, it may be stored in eDRAM depending on the cache attributes set. The graphics engine and core can "share" the eDRAM or completely ignore it depending on the transaction.

Re:Win95? (1)

aiadot (3055455) | about 5 months ago | (#45500937)

Win95? My first laptop had 128MB of RAM and was capable of running XP.

But the answer is no. The OS just wasn't designed to use that function of the processor in such way. Maybe if you wrote a VM that emulated RAM on the L4 cache, but the only purpose of this approach would be satisfy the curiosity.

Windows XP on 128 MB of RAM (2)

tepples (727027) | about 5 months ago | (#45501829)

Did your 128 MB laptop continue to run Windows XP well even after having installed the service packs that increased how much RAM it uses? Even under Windows 2000, printing certain documents filled RAM on my old 128 MB desktop PC.

Re:Win95? (1)

ericloewe (2129490) | about 5 months ago | (#45500941)

Yeah*, but what's the point?

*Assuming the OS doesn't freak out - which will definitely happen. Let's just say there's no technical barrier to overcome.

Re:Win95? (1)

archen (447353) | about 5 months ago | (#45502991)

Depends on what you're doing with it. I have a laptop (Pentium 3 / 128Mb RAM) with FreeBSD 10 on it. It works well but the application options running in X are limited unless you want to go into swap. A huge portion of what people consider regular computer usuage is "browse the internet". Good luck doing that these days with 128Mb RAM.

Ours goes to 11 (0)

Anonymous Coward | about 5 months ago | (#45500263)

I want 1 gig of L1, L2, L3 cache!

Memory is dirt cheap right now. Lets cram it into every space we have!

Re:Ours goes to 11 (4, Insightful)

muridae (966931) | about 5 months ago | (#45500403)

Let's see, the tiny amount of L1/2/3 cache currently is dictated by the energy budget of the CPU. Looking at the energy budget of the 4900MQ and the 4960HQ chips, you can take some wild arse guessing to get that the 2 megs of L3 cache sacrificed got back enough to power the 128 megs of L4. Then consider that there is only 64K (yes, kilobytes) of L1 or 256K L2 per core on the Haswell chips, and at 3.9GHz desktop chips you are looking at 84 watts of power dissipated . . . you can start to work out how much of that is due to leakage current from the 6 transistor L1/2/3 cache design.

Let's face it, SRAM isn't tiny, it leaks amps like a sieve at the tiny process size that everything is done at now days, and it's main advantage is that it doesn't take a controller to access and it's bloody fast and the bandwidth can be pretty sizable. A gig of SRAM on die would, I suspect, heat a small room; that much DRAM per core would slow the cores down due to the inherent latency of accessing DRAM.

So, sure, DRAM chips may be cheap, but putting them on the CPU die would be horrid. And SRAM still isn't cheap; either in die space, energy budget, or dollars!

1T-SRAM (1)

tepples (727027) | about 5 months ago | (#45503345)

Let's face it, SRAM isn't tiny, it leaks amps like a sieve at the tiny process size that everything is done at now days, and it's main advantage is that it doesn't take a controller to access and it's bloody fast and the bandwidth can be pretty sizable.

Then perhaps MoSys had the right idea: make a bunch of small, independent DRAM blocks and a front-end controller with as much SRAM as one block to hold cached results while waiting for the corresponding DRAM row to refresh.

128 MB L4 cache (1)

onyxruby (118189) | about 5 months ago | (#45500463)

This is making me feel old as I recall how happy I was to have once maxed a board with 32 MB of RAM, a previous one with 8 MB, another with 4 MB and so on. I love that about technology, it pretty much always gets better until DRM and politics get into the mix...

/get off my lawn

//not really

Re:128 MB L4 cache (3, Interesting)

Gravis Zero (934156) | about 5 months ago | (#45500523)

you can revisit those those nostalgic 8MB and 4MB days again with the latest AMD chips [wikipedia.org] as L2 cache. :)

just use a modifide version coreboot [coreboot.org] to bypass those silly POST tests and load to the CPU cache directly with Windows 3.11 :)

Re:128 MB L4 cache (2)

neokushan (932374) | about 5 months ago | (#45500525)

I got the same feeling when I got my first Android phone. 576MB of RAM...in a phone. I've recently upgraded and my new device has 3GB of RAM. It feels like only recently that I hit that amount in a desktop computer, now I have it in a device that fits in my pocket - never mind the quad-core CPU or 64GB of internal storage.

10 years ago, that would have been a reasonably powerful desktop machine.

Re:128 MB L4 cache (2)

fisted (2295862) | about 5 months ago | (#45500943)

It's still [like] a reasonably powerful desktop machine, if you avoid running a bloated OS on it.

Re:128 MB L4 cache (1)

Shinobi (19308) | about 5 months ago | (#45501515)

It would still be a "reasonably powerful desktop machine" if your use case is still the same as 10 years ago.

However, contrary to what many geeks think, people don't just browse, do email, watch youtube etc.. A fair amount of non-geeks do CAD, image/video editing, 3D graphics, create music etc with their desktop machines, and routinely have workloads that would bring that 10 year old computer into thrashing hell...

In fact, I think the whole "oh, ordinary people just need enough power to browse, email etc" is a state that has been created by geeks who have no interest in helping their family members or friends by enabling a hobby, so it's become self-perpetuating. Sure, there are ordinary users who just needs that. But there are also ordinary users who have interests beyond that, but has no geek interest in computers, operating system geek wars, programming etc..

Re:128 MB L4 cache (2)

fisted (2295862) | about 5 months ago | (#45502271)

It would still be a "reasonably powerful desktop machine" if your use case is still the same as 10 years ago.

Yeah, I was a hard- and software developer 10 years ago, my use case didn't shift too much. OTOH, I happen to /do/ CAD on this nearly 10 year old computer (single-core and all that, can you believe it?), so my information is first-hand.

[meaningless windows-centric gibberish excusing bloatware]

Whatever.

Re:128 MB L4 cache (2)

surd1618 (1878068) | about 5 months ago | (#45502899)

So true. Nerds who are not computer nerds often have the highest computing needs. I can do everything I usually want to do with an ancient laptop because I don't do graphic design or record music or make 3D models. Shoot, if I am going to play a video game it's probably Doom or Starcraft. A computer that plays youtube videos reliably will do anything I want in IDLE or Emacs and even runs small VMs okay. The only thing I'd want a modern desktop for would be video format conversion or bloated Processing code.

Re:128 MB L4 cache (2)

ShanghaiBill (739463) | about 5 months ago | (#45502303)

This is making me feel old

If your youthful recollections are about memory measured in MEGAbytes, then you are not old. Back in the 1970s, I worked on a controller board with a Z80, 256 bytes of ROM and ZERO RAM. All the state information had to be kept in registers (but, fortunately, a Z80 has two register banks). No RAM means no stack, so to call a subroutine, you had to save the return address in a register, so subroutines couldn't nest. As I recall, it just had to monitor a voltage and dial a phone number if it dropped too low, or something like that.

/get off my lawn

Back in my day, we didn't have lawns. Just a prairie for the buffalo.

not on die (5, Informative)

Gravis Zero (934156) | about 5 months ago | (#45500467)

128MB L4 cache. [...] on-package (not on-die) pool of memory

what this means is the memory is not on the same piece of silicon as the CPU, just stuffed in the same chip package. this means they have to be connected by a lot of tiny wires instead of being integrated directly. the downside to this is that there is bandwidth between the L4 memory and the CPU is very limited and it uses more power. like AMD's first APUs where just two ICs on the same chip, i dont not think this will result in a drastic performance improvement but i'm unsure of the power savings. If AMD gets wise, they will beat Intel to the punch but then again. though if AMD is really smart, they would put out ARMv8 chips not just for servers(/desktops?) but for smartphones/tablets and laptops.

Re:not on die (5, Informative)

lenski (96498) | about 5 months ago | (#45500783)

what this means is the memory is not on the same piece of silicon as the CPU, just stuffed in the same chip package.

Which allows the designers to count on carefully controlled impedances, timings, seriously optimized bus widths and state machines, and all the other goodies that come with access to internal structures not otherwise available.

Such a resource could, if used properly, be a significant contributor to performance competitiveness.

Re:not on die (1)

Salgat (1098063) | about 5 months ago | (#45501649)

Not on die means they have more control over quality and costs, as you don't need to scrap both the L4 cache and CPU if either die is bad. I personally love SoC and want to see more of it. One day we may see much of the motherboard all internalized on the same package as the CPU; this L4 cache could be just the first step to eventually internalizing RAM.

Fact Check Please (0)

Anonymous Coward | about 5 months ago | (#45500519)

I'll admit that I don't keep up with all the latest and greatest chipsets and specs but something seems wrong here. I remember back in the 1990's that you could get FPM (and later EDO) DIMMS that were 60ns. This article is saying that this new L4 cache is 60ns implying that the latest DDR3-whatever has a latency of 120ns. (Assuming half the speed means half the latency).

Re:Fact Check Please (1)

Bengie (1121981) | about 5 months ago | (#45501227)

100mhz DDR3 isn't that much faster for latency than EDO 66mhz, but it does have more bits to charge up and takes a bit longer to find the correct bits. The external bus is a lot faster, but the internal processing speed is not.

Re:Fact Check Please (2)

TechyImmigrant (175943) | about 5 months ago | (#45501491)

Round trip time for an old school EDO DIMM is not the same thing as the burst cycle time of a synchronous dram. It take about the same time to get going, but the data bursts faster and wider.

Oh noes (1)

Anonymous Coward | about 5 months ago | (#45500569)

All my algorithm development so far assumes small local caches.
Now I can start over again.
Aaaahhh!!!

Intel (0)

Anonymous Coward | about 5 months ago | (#45500607)

Add moar cache to fix cpu problems

Trade off makes sense.. (1)

Travis Repine (2861521) | about 5 months ago | (#45500705)

I may not not get the speed out the caches but when you consider how much RAM is utilized in your laptop, smartphone, etc., this is actually a smart move. More room means means a better way to utilize the RAM allowing other opportunities to exist..

Don't you mean... (1)

Type44Q (1233630) | about 5 months ago | (#45500837)

At 1.6GHz, L4 latencies are 50-60ns which is significantly higher than the L3 but just half the speed of main memory.

Don't you mean "but less than half the latency of main memory?"

Cool (1)

fisted (2295862) | about 5 months ago | (#45500921)

So as soon as i get one of these, i won't need any DRAM anymore, since 128MB is way more than my typical memory footprint (including kernel and X11)

I do look forward to this.

Re:Cool (0)

Anonymous Coward | about 5 months ago | (#45501185)

Yeah, when I saw this I was thinking that this would make a very impressive low-power router.

Again? (0)

Anonymous Coward | about 5 months ago | (#45500983)

first they added more cores, now they're adding more cache. What's next? Integrated chipset or DRAM?

All these are cheap and worthless improvements! We need faster CPUs - 8GHz for single-core this year and 16GHz next year!

Re:Again? (1)

Rockoon (1252108) | about 5 months ago | (#45503143)

All these are cheap and worthless improvements! We need faster CPUs - 8GHz for single-core this year and 16GHz next year!

eDRAM may be many things, but cheap isnt one of them.

Half speed L4 cache, are we back in the P2 era? (0)

Anonymous Coward | about 5 months ago | (#45501001)

Come on Intel, the last time you did this to us was with the off-dye Pentium 2 L2 cache, and only the Xeon had the full speed cache. then you made the Celeron's not have the cache and the performance gap between the three was substantial.

I'm not saying we shouldn't have it, but I am saying that history repeats itself. The first chips that have it, will have it, and then either it will go away or will be integrated into the cpu dye assuming we can get another dye shrink (It's predicted that 14nm or 12nm may be where there's no longer any ROI on dye shrinking with Si)

Eh, it's been done. (2, Informative)

Anonymous Coward | about 5 months ago | (#45501301)

POWER8, anyone? With actual SMT instead of flakey HT, and lots more threads, and so on, and so forth.

Too bad they're unobtanium and if not cost too much. But otherwise... anything intel does has basically been done better before. Except process. That is the only thing they really lead with. The rest isn't half as interesting as most of the world makes it out to be.

Re:Eh, it's been done. (0)

Anonymous Coward | about 5 months ago | (#45503825)

Power8 only runs in the lab right now, but Power7 has eDram caches and I can't remember whether Power6 had them or not. Besides that, IBM eDram seems to have a much lower latency, they use it for L3 cache. I remember an IBM document admitting that it's a bit slower than SRAM, but the trade-off is that the cache is much bigger and hit rate higher, so that it edns up being a win for many applications.

Available today, but uncommon (0)

Anonymous Coward | about 5 months ago | (#45501351)

You can get this today, but it's not as flexible as you might wish:
- You currently can only get it in a high-end i7 laptop. Desktop and low-end laptop i7 chips don't have it.
- It's only active when the GPU is not used, so you need a discrete GPU in your system, and it has to be on all the time.
- You can't use this as system memory or whatever (as some of the other comments were hoping for.) All it ever stores are the flushed misses from the L3 cache.
- It massively increases the memory working set, which can benefit some algorithms (e.g. physics simulations, software H.265 encoding) enormously. See graph over at Anandtech:
http://images.anandtech.com/doci/6993/latency.png [anandtech.com]
The high-end desktop chip is predictably left in the dust in the 8MB-128MB range. Whether this trumps its other advantages is probably only true for a few algorithms.

I actually have it active in the Haswell laptop I'm typing this on, but it's an uncommon setup. To get one, go to the Apple store, select the highest-end Retina Macbook Pro (the only one that still has discrete graphics) and click the processor upgrade to 2.6GHz so that you end up with a i7-4960HQ (the 2.3 chip might have it too, not sure.) Then go to the Energy Saver Preferences and turn off Automatic Graphics Switching so that the discrete GPU is on all the time.

Pretty obvious evolution (1)

m.dillon (147925) | about 5 months ago | (#45502249)

With Intel's 14nm so close, and 10nm production in another year or so, they need to use all that chip area for something that doesn't necessarily generate a ton of heat. RAM is the perfect thing. Not only is the power consumption relatively disconnected from the size and density of the cache, but not having to go off-chip for a majority of memory operations means that the external dynamic ram can probably go into power savings mode for most of its life, reducing the overall power consumption of the device.

-Matt

50-60ns is FAST?!? (0)

Anonymous Coward | about 5 months ago | (#45502673)

According to the summary, L4 cache has 50-60ns latency, and is half the latency of main memory (presumably 100-120ns).

The summary is bad, because it gives the impression that the 70ns static column ram that comprised the main system ram on an Amiga was almost as fast as today's slower cache ram, and had almost double the performance of DDR3.

The truth is that the 50-60ns latency (vs 70ns, vs 100-120ns) is "time to fetch first arbitrary byte at some arbitrary address". However, at best, that tells (statistically) less than 25% of the story, because CPUs don't fetch single arbitrary bytes from single arbitrary locations. At the very least, they're usually going to grab at least 4 sequential bytes, if not WAY more. And that's where the difference comes in. The smallest meaningful benchmark would be more like, "how many nanoseconds does it take to fetch 16 or 32 consecutive bytes from an arbitrary address in ram" (4 or 8 bytes for the opcode, 4 or 8 bytes for the argument, assuming one pair that actually does something related to fetching/storing/arithmetic, and another that makes a branch decision based upon it(*)).

Going by the "fetch 16 or 32 bytes" benchmark, even slow 120ns DDR3 is going to completely smoke 70ns SCR, because SCR read bytes 2 through 32 just 8 (later, 16 or 32) bits at a time with a clock rate of (at best) 32MHz. In contrast, the 120ns DDR3 transfers the sequential bytes at a rate equivalent to a 32, 64, or 128-bit bus with 100 or 133MHz clock rate (as I understand it, the 800mhz, 1600mhz, and higher insane-level speeds came about because they reduced the number of physical traces and serialized 8 bits into a pair of LVD traces (so 800mhz is roughly equivalent to 8x100MHz), then started doing "Atari Math" (deciding that 4 800MHz serial links are "3200MHz" by "doing the math" and adding them up to get a bigger number).

That said, from what I recall, the performance of system ram on mainstream PCs has basically stagnated since DDR. It's gotten enormously cheaper to IMPLEMENT, but a modern workstation-class PC (say, Dell Precision or higher) with DDR3 can physically fetch 4 arbitrary blocks of 32 bytes from main system ram in *maybe* 70-80% of the time it took a comparable workstation back in the DDR era (at least, from the perspective of a single-threaded app... obviously, dual/3-channel memory, multi-core norms, and SMP-aware software could change the equation a bit if you're talking about OVERALL system performance).

(*)Yes, I know realmode opcodes aren't 4 bytes... but realmode opcodes are almost irrelevant to anything compiled for x86 or AMD64 under Windows or Linux, anyway.

Load More Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Sign up for Slashdot Newsletters
Create a Slashdot Account

Loading...