Researchers Claim 1,000 Core Chip Created

Become a fan of Slashdot on Facebook

Researchers Claim 1,000 Core Chip Created 118

Posted by CmdrTaco on Monday January 03, 2011 @02:12PM from the eat-the-seeds dept.

eldavojohn writes "Remember a few months ago when the feasibility was discussed of a thousand core processor? By using FPGAs, Glasgow University researchers have claimed a proof of concept 1,000 core chip that they demonstrated running an MPEG algorithm at a speed of 5Gbps. From one of the researchers: 'This is very early proof-of-concept work where we're trying to demonstrate a convenient way to program FPGAs so that their potential to provide very fast processing power could be used much more widely in future computing and electronics. While many existing technologies currently make use of FPGAs, including plasma and LCD televisions and computer network routers, their use in standard desktop computers is limited. However, we are already seeing some microchips which combine traditional CPUs with FPGA chips being announced by developers, including Intel and ARM. I believe these kinds of processors will only become more common and help to speed up computers even further over the next few years.'"

This discussion has been archived. No new comments can be posted.

Researchers Claim 1,000 Core Chip Created

Load All Comments

Search 118 Comments Log In/Create an Account

Comments Filter:

Programmable CPU's (Score:3, Interesting)

by kge ( 457708 ) writes: on Monday January 03, 2011 @02:20PM (#34745638)

How long will it be before we will see the first motherboards with FPGA emerge?
Then you can download the CPU type of your choice:
-- naah, I don't like this new Intel core, I will try the latest AMD instead...

Share
twitter facebook
- Re:Programmable CPU's (Score:5, Informative)
  
  by Hal_Porter ( 817932 ) writes: on Monday January 03, 2011 @02:28PM (#34745726)
  
  A desktop CPU in an FPGA will always cost more and perform worse (i.e. slower clock rate) than a full custom chip from Intel or AMD. Mind you I've seen embedded designs where a microcontroller, Ram, Rom and custom logic are implemented in a $10 FPGA - especially where volumes are too low for an ASIC.
  On the other hand I could definitely see programmable logic inside Intel or AMD CPUs, a sort of super SSE. Then again even there you'd probably be better off using GPU like custom hardware for the heavy lifting. In fact I can see CPU/GPU hybrids being very common in low end machines. Full custom logic is always going to have a performance per $ advantage over FPGAs unless FPGA technology chains drastically.
  
  Parent Share
  twitter facebook
  - Re: (Score:3)
    
    by RKBA ( 622932 ) writes:
    
    FPGAs have been dynamically reprogrammable for years. You could load one with whatever special "hardware" custom instructions you wanted on the fly. Yes, custom logic is faster, but is inflexible.
    - Re: (Score:2)
      
      by wagnerrp ( 1305589 ) writes:
      
      They're dynamically reprogrammable, but its not like you can just just instantly flip to another ROM. These things take time to switch to another configuration. They are much better suited for batch operation, running one task completely before moving onto the next, than multitasking.
  - Re: (Score:3)
    
    by Lumpy ( 12016 ) writes:
    
    I'd like to see a FPGA 1x Pci express daughter-board and a open and well defined interface so that software can reconfigure and then use the FPGA's on the daughter-board for useful PC tasks....
    Game using it for high speed calculations, then DVD Fab uses it to crack BluRay encryption faster, Video encoding, Audio encoding, then the browser uses it for encryption, etc....
    A nice open standard without greed attached so everyone can use it in their software. Although in the world of many cores not being use
    - Re: (Score:1)
      
      by conspirator57 ( 1123519 ) writes:
      
      you'll want >=8x PCIe since most interesting applications (especially those that distributed computing is worst at) are IO bound.
      problem: FPGA parts big enough to have PCIe and DDR interfaces and still do interesting stuff with are expensive on their own @ $600
      http://avnetexpress.avnet.com/store/em/EMController/FPGA/Xilinx/XC5VLX50T-1FF1136C/_/R-4696910/A-4696910/An-0?action=part&catalogId=500201&langId=-1&storeId=500201&listIndex=-1 [avnet.com]
      http://www.em.avnet.com/evk/home/0,1707,RID%253D0%2526CI [avnet.com]
    - Re: (Score:2)
      
      by mrmeval ( 662166 ) writes:
      
      You should jump on some of the newer non-volatile FPGA's that can run a microcontroller core. I found one from Lattice who had it on sale for 29.95 with a jtag programmer. Now they are 50. There are other brands and I'm always looking for cheap dev kits. I think there's a devkit for Omap from Ti that's open source and not to expensive but I'm not finding it.
    - Re: (Score:2)
      
      by guruevi ( 827432 ) writes:
      
      They're called FPGA accelerators and they already exist. You just won't find them in your general desktop as the entry level cards cost about as much as a high-end workstation.
  - Re: (Score:2)
    
    by ByOhTek ( 1181381 ) writes:
    
    given your last comment - I think pre-defined hardware such as AMD/Intel desktop chips will always be faster than FPGA for a pre-specified set of individual operations. It's only when you get to operation combinations not defined at manufacture time, but used frequently, that FPGA has an advantage.
    The current CPU design will stay for most of the work, and an FPGA attachment would handle the specialty work that isn't needed most of the time, and can be dropped.
    This issue is reprogramming time and muti-thread
    - Re: (Score:3)
      
      by durrr ( 1316311 ) writes:
      
      A non-FPGA AMD/Intel CPU will always be faster doing general CPU business than a FPGA implemented one doing the same.
      It is however a stupid approach, a CPU is built to do general purpose calculations to allow for all software to exist without specialized hardware. A FPGA on the other hand is made to configure into specialized hardware in order to... well, i guess not having to build a lot of prototypes for hardware testing was its original purpose. But its use go far beyond that in that it could turn into
      - Re: (Score:2)
        
        by SuricouRaven ( 1897204 ) writes:
        
        The typical home user rarely needs to do any really heavy number-crunching - the closest they get is physics in games. It has definate use in scientific computing and analytics, though - espicially as it allows the engineers to constantly improve the programs without needing to get new silicon manufacturered. It's a niche into which GPGPU has settled quite happily, though - and it does such a good job, only the most extremally demanding workloads may justify the expense of FPGAs and people with the skills t
        
        Re: (Score:2)
        
        by vertinox ( 846076 ) writes:
        
        The typical home user rarely needs to do any really heavy number-crunching - the closest they get is physics in games.
        For the past 5 to 8 years there has been a "rasterization vs. ray tracing" [google.com] debate in the game developing and graphics community (with ray tracing in real time in games only being a theoretical pipe dream until recently).
        If someone were to make ray tracing feasible, cheap, and practical for either a console or desktop PC, then yes... Home users will need that number crunching as Ray Tracing i
  - Re: (Score:2)
    
    by Lennie ( 16154 ) writes:
    
    Intel has already a line of Atom-processors with a FPGA for I/O operations.
  - Re: (Score:2)
    
    by Dogtanian ( 588974 ) writes:
    
    How long will it be before we will see the first motherboards with FPGA emerge?
    A desktop CPU in an FPGA will always cost more and perform worse (i.e. slower clock rate) than a full custom chip from Intel or AMD.
    Sure, but no-one's going to do that anyway- if the OP thought that, then he missed the potential of his own idea.
    
    I thought up something similar a few years back, and realised that, yes, the performance would obviously be horribly uncompetitive and pointless if you simply tried to reproduce (e.g.) an x86 chip's circuitry with an FPGA. The obvious idea (or rather, my idea, which I suspect countless other people also figured out independently) is that the FPGA *circuit* implemented in hardware replaces the *
- Re: (Score:1)
  
  by pantherace ( 165052 ) writes:
  
  You can get an AMD motherboard with a Hyptertransport link brought out, and then an FPGA to go into it.
  (Just be careful before you look at the prices. They suffer from being a very niche market.)
- Re: (Score:2)
  
  by wildzeke ( 191754 ) writes:
  
  Like this?
  http://www.xilinx.com/products/devkits/HW-V5-ML510-G.htm [xilinx.com]
- Re: (Score:1)
  
  by conspirator57 ( 1123519 ) writes:
  
  nice idea, but it will be dirt slow and 10x as expensive.
  btw: welcome to 2003 when Xilinx released the Virtex II Pro.
- Re: (Score:2)
  
  by melstav ( 174456 ) writes:
  
  How does last year sound? http://hardware.slashdot.org/story/10/11/23/0642238/Intel-Launches-Atom-CPU-With-Integrated-FPGA?from=rss [slashdot.org] Granted, it might be a while before they are commonly found on commercially available boards. And as others have pointed out, If you do it in *real* hardwdare, it will be faster than if you did it in an FPGA. This is more like a customizable coprocessor to the Atom. You could even use it to replace the motherboard chipset, conceivably.
- Re: (Score:1)
  
  by cronicthebadger ( 597816 ) writes:
  
  How long will it be before we will see the first motherboards with FPGA emerge? Then you can download the CPU type of your choice...
  I suspect we shall see them by the year 2002!
  Motherboard containing FPGAs combined with dedicated hardware, allows downloadable FPGA cores to emulate CPU models, "chipsets" and architectures: http://en.wikipedia.org/wiki/C-One [wikipedia.org]
1,000 cores (Score:2)

by Low Ranked Craig ( 1327799 ) writes:

or 1,000 logic blocks? Are they equivalent? Aren't FPGAs common and generally contain multiple logic blocks?
- Re: (Score:3)
  
  by Muad'Dave ( 255648 ) writes:
  
  My bet is 1,000 very simple cores - most decent-sized FPGAs contain 10's or 100's of thousands of 'logic blocks'. The Spartan 6 [xilinx.com] series has between 3,840 and 147,443 logic blocks.
  - Re: (Score:2)
    
    by yurtinus ( 1590157 ) writes:
    
    Tens of thousands of blocks, but how many do they spend implementing their CPU cores? Could be using multiple FPGAs or a very very simple CPU core... I'm more intrigued by the blurb about Intel and ARM developing CPU/FPGA chips - could be a lot of fun with (hopefully) a lot lower cost than a Virtex.
  - Re: (Score:1)
    
    by ace of death ( 462104 ) writes:
    
    "By creating more than 1,000 mini-circuits within the FPGA chip, the researchers effectively turned the chip into a 1,000-core processor - each core working on it's own instructions."
    This is entirely feasible, but the 'cores' would have to be very very simple. Looking at the data sheet for the Xilinx Virtex 6 FPGA, it contains 118,560 Configurable Logic Blocks, which are equivalent to four Look Up Tables, and 8 flip-flops. If you wanted to create an 8-bit instruction set processor, it would require at minim
    - Re: (Score:2)
      
      by Muad'Dave ( 255648 ) writes:
      
      I agree, hence the "very simple" in my reply. I bet they are extremely limited, but fast. Other brands/models of FPGAs have different definitions of 'complex' - Altera has some pretty smokin' FPGAs, too.
  - Re: (Score:2)
    
    by durrr ( 1316311 ) writes:
    
    The Xilinx virtex-7 series supposedly contain up to two million logic blocks. If i've got it right the spartan is the xilinx hobby lineup whereas the virtex is their Industry lineup.
- Re: (Score:2)
  
  by blair1q ( 305137 ) writes:
  
  FPGAs can be programmed to emulate any logic hardware (logically, though not usually electrically, so power and timing will not be accurate though the logical results will be identical). Many CPU cores have been rendered as library modules that can be programmed into an FPGA. Put 1,000 of them in your FPGA (or big array of FPGAs in this case) and route them together, and you can claim you have a 1,000-core CPU.
  Of course, it takes more than one FPGA chip to do this, so you can't in any sense claim a 1,000-
Star Bridge Systems (Score:2)

by Talinom ( 243100 ) * writes:

It may be too late, but perhaps someone could talk with Viva Computing, LLC who now owns the assets of Star Bridge Systems [starbridgesystems.com]. It was not specified in the news release if they also own the intellectual property.
Took long enough... (Score:3)

by Crudely_Indecent ( 739699 ) writes: on Monday January 03, 2011 @02:26PM (#34745706) Journal

This story was already submitted two times before eldavojon managed to get it to the front page in a little over an hour...
http://tech.slashdot.org/submission/1432844/University-of-Glasgow-pioneers-1000-core-processor [slashdot.org]
http://tech.slashdot.org/submission/1432512/1000-core-processors- [slashdot.org]

Share
twitter facebook
- Re: (Score:3)
  
  by seifried ( 12921 ) writes:
  
  Those two submissions are poorly written and have no real detail compared to this one (which is no gem, but is better).
- Re: (Score:1)
  
  by korgitser ( 1809018 ) writes:
  
  There also was some news about a monkey with three asses...
Does anyone have a link... (Score:3)

by John Hasler ( 414242 ) writes: on Monday January 03, 2011 @02:26PM (#34745710) Homepage

...to a paper that assumes that the reader already knows what a cpu is? This article is content-free.

Share
twitter facebook
Life Cycle (Score:5, Interesting)

by glueball ( 232492 ) writes: on Monday January 03, 2011 @02:27PM (#34745718)

I think this is a great development. I've been using FPGAs in medical imaging for about 15 years. The groups that use the GPUs are getting great performance--definitely--but seeing as how MRI and CT machines are placed and need to run for 10, 15 20 years, I don't see how the GPUs will survive that time. One large OEM was pushing the GPUs for their architecture and I can't believe it will be successful if success is measured on the longevity scale. I'm sure the service sales guy will clean up.
Why do GPUs fail? I'm not sure of the exact modes of failure but the amount of heat has got to have something to do with it. FPGAs will run much cooler and in the FLOPS/Watt game, will win.

Share
twitter facebook
- Re: (Score:2)
  
  by ByOhTek ( 1181381 ) writes:
  
  If they make the GPU replaceable, it's not such a big deal.
  If they underclock the GPU to reduce heat, again, not such a big deal.
  A GPU might have an expected 5-10 lifetime at full throttle, but if you knock it back to 25%, you probably will get a much better survival rate.
- Re: (Score:1)
  
  by ace of death ( 462104 ) writes:
  
  The drawback with using FPGAs compared to commodity processors is that the FPGA market currently does not support using the bleeding edge processes that CPUs are manufactured with. Typically a competitively priced FPGA will be at least one generation behind a CPU. In HPC FPGA's are a plausible improvement, but at a smaller scale the development costs for incorporating a custom firmware for an FPGA into an application are significant. It all really rests on what demand is out there for a particular algorithm
  - Re: (Score:1)
    
    by Apocros ( 6119 ) writes:
    
    Actually, usually FPGAs are on the bleeding edge of manufacturing processes. Intel may have beat everyone to 28/32nm, but expect to see 28nm FPGAs from Altera and Xilinx (from TSMC and/or Samsung) around the same time as 28/32nm ASICs from AMD or nVidia. Intel rolls their own, but everyone else is using the same foundries...
  - Re: (Score:2)
    
    by ChrisMaple ( 607946 ) writes:
    
    FPGAs are much slower and less efficient and bigger than a dedicated design because even the simplest gate is actually a block that can be controlled to perform many different functions. That block consists of several latches and a complex gate, perhaps a hundred transistors in all, whereas a 2-input nand gate consists of four transistors. So it's 25 times bigger (area), and the distance to the next gate is increased by 5x (linear). The complexity makes the block inherently slower than a simple gate, and th
- - Re: (Score:2)
    
    by Bassman59 ( 519820 ) writes:
    
    FPGAs are more difficult to program.
    You don't program FPGAs.
    FPGA development is synchronous digital logic design. Verilog and VHDL are hardware description languages; they are not programming languages. Having a software-engineering or programming background does not mean you can simply learn Verilog and start doing FPGA design.
- Re:Life Cycle (Score:4, Interesting)
  
  by SuricouRaven ( 1897204 ) writes: on Monday January 03, 2011 @04:41PM (#34747208)
  
  I don't see why an MRI machine processor can't be made fault-tolerant. If a GPU burns out, it could just be disabled and a fault warning indicated - and then the machine can carry on working, even if it does take significently longer to produce an image. Then you call tech support, they come around and pull the faulty part and slot in a new one. The only concern then is making sure parts are available in twenty years - and I imagine any machine that expensive has to come with a long-term support contract which will oblige the manufacturer to ensure a supply of compatible boards in years to come.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by glueball ( 232492 ) writes:
    
    Two things--if there's a failure, then there's a problem. The machines for years used to use military grade hardware. Machines that were designed in 1992, sold in 1994 are still running strong today. Then to cut costs, the OEMs switched to more commodity hardware and they've effectively sucked in uptime since. You make it sound like it's no big deal to call tech support. It is a big deal. To put it in dollar terms, we had a machine go down for technically 4 hours. The tech was there, made the diagnos
- Re: (Score:2)
  
  by shadow_slicer ( 607649 ) writes:
  
  Are you serious? There's no way FPGAs beat GPUs in FLOPS/watt.
  FPGAs have so much more overhead both in space and power due to programmability, whereas GPUs are pure processing. Further the algorithms necessary for CT and MRI are practically the same algorithms GPUs were designed for, so if you were to use an FPGA, your design would end up with a similar architecture anyway. Further, while low end commercial GPUs (like those you and I use for gaming), may only last 3-4 years, the high end scientific computin
  - Re: (Score:2)
    
    by glueball ( 232492 ) writes:
    
    I am serious and you are wrong. I don't have a clear idea what you mean about space and power due to programmability. FPGAs are soft coded hardware. If by the nature of being able to code it and change it you mean "overhead" then fine. But even with that overhead, they are still more efficient. You might be thinking of raw speed instead of FLOPS/Watt.
    From "A Comparative Study on ASIC, FPGAs, GPUs and General Purpose Processors in the O(N^2) Gravitational N-body Simulation
    "
    "In this paper, we describe th
    - Re: (Score:2)
      
      by SpazmodeusG ( 1334705 ) writes:
      
      What's most surprising it that the research was on matrix dot products, something that graphics cards do in 3D operations. The FPGA beat the graphics card at its own game in both performance and performance per watt.
      I'm impressed. Perhaps we'll see graphics cards made up of nothing but programmable FPGAs in the future. Instead of loading and running a CUDA kernel we'll be loading and running an FPGA core.
- Re: (Score:1)
  
  by Siffy ( 929793 ) writes:
  
  A $3 Million MRI machine can't afford to have 10 $100 redundant backup GPUs inside it? Of course commodity hardware isn't medical grade. Anyone trying to shove an off the shelf GTX 580 WTF FTW suck-my-balls-off edition card into such an expensive device is cutting some huge corners instead of requesting industrial/medical grade units from any of the potential manufacturers. So what if that part costs $50k and is equivalently powerful as a $50 card at Best Buy.
- Re: (Score:3)
  
  by raftpeople ( 844215 ) writes:
  
  Not all problems map well to current GPU offerings. I have a problem that would benefit from parallel processing but due to a branchy algorithm and very random access for read/write, I can't really take advantage of GPU's to the extent some algorithms can (note: I have coded and run it on GPU's so this is more than just theory, additionally I have coded it to run on a network of computers and unfortunately the calc time vs network transmission time ratio for each cycle is not favorable enough for that to
- Re: (Score:2)
  
  by Bassman59 ( 519820 ) writes:
  
  What are the practical differences between targeting an FPGA on a computing platform and targeting more ubiquitous massively-parallel programmable pipelines in modern GPUs? Also, what are the fundamental differences? Could my GPU already contain FPGAs?
  The main difference is that you don't program FPGAs. You do synchronous digital logic design which is implemented in the FPGA fabric. Thinking that you can program them like you program a sequential-execution processor is a recipe for failure. And, yeah, C-to-gates tools are a joke.
Disappointment (Score:5, Funny)

by TheL0ser ( 1955440 ) writes: on Monday January 03, 2011 @02:33PM (#34745776)

The story's been up for 20 minutes and no one's tried to imagine a Beowulf cluster of them yet? This is a great sadness.

Share
twitter facebook
YouTube Algorithm (Score:2)

by digitaldc ( 879047 ) * writes:

The researchers then used the chip to process an algorithm which is central to the MPEG movie format – used in YouTube videos – at a speed of five gigabytes per second: around 20 times faster than current top-end desktop computers.

20x speed is getting closer to what I need before I can even ATTEMPT to build my very own Holodeck.

http://en.wikipedia.org/wiki/Holodeck [wikipedia.org]
- Re: (Score:2)
  
  by SuricouRaven ( 1897204 ) writes:
  
  This is slashdot. Do you really think anyone here doesn't know what a holodeck is? Half the users have probably tried to design one.
FPGAs ... (Score:2)

by Bassman59 ( 519820 ) writes:

Yawn. Seriously.
(says the guy who does FPGA design for a living.)
- Re: (Score:2)
  
  by Jepler ( 6801 ) writes:
  
  Indeed. 1000 simple CPUs will fit in a FPGA, though it might require one near the top of the line. (e.g., picoblaze reportedly needs 96 "slices" and 1.5 "block RAMs"; the biggest Virtex-7 FPGA has more than 1400x as many block RAMs and 3100x as many "slices") There's little doubt that you could program a DCT for a picoblaze, if you wanted to.
  It's hard to tell what 5.0GBps refers to -- the bitrate of the incoming, uncompressed, RGB video data? If so, that's maybe about 800FPS of 1080P video. In a circa
first (Score:2)

by iamacat ( 583406 ) writes:

We first need to break a lock of x86 instruction set and the operating system that requires it. CPUs already try to execute multiple x86 instructions in parallel, but this is severely limited by sequential instruction set design. There needs to be a way to express computation A and B using different sets of virtual registers and let hardware execute them sequentially or in parallel depending on its capabilities, or vectorize/parallelize multiple iterations of a loop. If software, including operating systems
- Re: (Score:2)
  
  by VortexCortex ( 1117377 ) writes:
  
  We first need to break a lock of x86 instruction set
  Yep. All hail ARM.
  There's a reason why embedded devices use ARM over x86. The x86 instruction set has a lot of instructions that no compilers (and therefore hardly anyone) ever use. Those unused instructions are just sitting there in the silicon, charged up with electrons, draining power, generating heat, and making it harder to create smaller & faster x86 chips. Some of these "deprecated" instructions are microcoded, but that just means they're slower and even less likely to be used by an optimizin
  - Re: (Score:3)
    
    by smallfries ( 601545 ) writes:
    
    Sigh. Multi-way branching was already old when ARM implemented it. What you fail to explain (understand?) is that there is a cost associated with either choice. As with most of engineering there is not a simple proposal that wins. In the case that branch prediction is perfect, the predicted execution is cheaper. In the case that the prediction is terrible the multi-way execution wins. In real life branch prediction is neither perfect, nor is it that terrible, so engineers have to balance the likelihood that
    - - Re: (Score:2)
        
        by smallfries ( 601545 ) writes:
        
        Well.... no. A few percent is a small deal. A larger percentage would be a bigger deal.
        You've made a critical failure here: the x86 *instruction* cache stores x86 instructions after they've been decoded into simpler RISCy form
        Yes - after they've been transferred across the bottleneck from memory. So at the point where it matters (the cache fetching lines from memory) the code is in a dense form because of the CISC encoding.
        It's really quite simple: RISC is an advantage where the cost of decoding dominates because it simplifies the decoder circuitry. CISC is an advantage where the cost of transferring instructions (and the space that they occup
- Re: (Score:2)
  
  by Lemming Mark ( 849014 ) writes:
  
  Admittedly slightly tangential to your discussion of virtual machines ... but part of the point of Intel's IA64 instruction set was to address this kind of thing. The compiler's job was to specify groups of instructions that could be executed safely in parallel, then the CPU would execute these according to its capabilities.
  But a higher-level virtual instruction set with just-in-time compilation is admittedly more insulated against future technology and more amenable to the code being run on a variety of a
good grief - maybe rediscover integration as well? (Score:1)

by Anonymous Coward writes:

"However, we are already seeing some microchips which combine traditional CPUs with FPGA chips being announced by developers, including Intel and ARM."
welcome to 2004.
Xilinx Virtex II, includes internal PPC 405GP
Security issues (Score:3)

by bluefoxlucid ( 723572 ) writes: on Monday January 03, 2011 @02:52PM (#34745960) Homepage Journal

A programmable hardware platform would provide amazing computing power because of hardware specialization: rather than emulating a proper CPU, you would download core architecture into the FPGA to accelerate tasks such as REGEX processing or H.264 decoding. You could compile the entire logic of a program into a gate array with various logical operators and flip-flop circuits for unlimited (albeit slow) registers (L2 registers) as well as including standard registers and SRAM cache (L1).
Although the FPGA runs slower than a regular CPU, direct programming rather than instructional programming (that is logic blocks that perform programmatic functions, rather than logic blocks that interpret discrete instructions to follow programmatic functions) would shorten the overall hardware logic path. In short, the chip would follow fewer clock cycles and instead just "do things." The CPU would be slow, but optimized for your workload. The main performance bottleneck would be the context switch: replacing the logic gate configuration with a new program every time you switch. Other than that, dynamic program expansion could be utilized: inlining operations like multiplication, addition, etc, or breaking them out if space constraints make it hard to load the whole program onto the FPGA that way.
The obvious, major issue we see is, of course, a security issue. You can now reprogram the CPU. This makes it difficult to prevent a program from bypassing any and all hardware security measures. This is solved by implementing a completely new security design on the chip, by which the CPU itself (the FPGA) is under control of external security mechanisms (paging etc handled in the MMU, outside the FPGA space, would largely mitigate most of this); it's not impossible to deal with, it's just an issue that needs to be raised.
In short, this sucks for "download the new Intel CPU into your BIOS/bootloader." This sucks for whatever general purpose CPU you can think of. For an entirely new programmatic platform, however, this would provide some interesting performance possibilities, and some interesting challenges.

Share
twitter facebook
- Re: (Score:2)
  
  by smallfries ( 601545 ) writes:
  
  This sounds reminiscent of the hype around reconfigurable computing ten years ago. A lot of the hype has died down now that people have tried and discovered that what you've described is wrong.
  First point: a specialised hardware circuit will always be faster than a generic circuit.
  Second point: generic circuits require a lot more interconnect than specialised circuits which impacts how many of them you can fit on a die relative to specialised circuits.
  Third point: a CPU is a set of specialised circuits bein
  - Re: (Score:2)
    
    by bluefoxlucid ( 723572 ) writes:
    
    So here is the basic problem. If the target application is made of steps that exist as specialised circuits in the CPU then selecting which of those circuits to apply in sequence will be faster than a generic circuit because the specialised circuit uses the space on the die more effectively and is clocked at a much higher frequency.
    If the target application is made of steps which are very unlike the circuits provided on the CPU then the generic design will win. For everything in-between it is a trade-off. Not as many things win as FPGA designs and there is ten years of literature showing marginal improvements.
    Encryption is a lot of things in CPU that are faster in hardware because it's a single clock cycle to do thing that are 30,000 clock cycles on the CPU.
    Regex calculation, faster in a specialized hardware chip.
    Codec decoding, we use an off-board CPU that has a microkernel and a small program; it benefits from just not running an OS and being a dedicated RISC processor, but in no other way.
    GPU, specialized instruction set. Not dedicated to a specific task, but dedicated to a type of task. WAY faster t
    - Re: (Score:2)
      
      by smallfries ( 601545 ) writes:
      
      It's odd that you pick crypto as I've spent a little time implementing crypto primitives on weird and exotic hardware. Sure - division is quite slow, that is why most primitives avoid the need for it, or only perform reductions in a specialised field rather than a full division. Multiplication on the other hand is fast and tends to be used a lot.
      AES is quite a bad example for FPGAs. The very latest AES extensions from Intel can compute a round of AES in under three clock cycles. Performing the full cipher t
      - Re: (Score:2)
        
        by bluefoxlucid ( 723572 ) writes:
        
        AES is quite a bad example for FPGAs. The very latest AES extensions from Intel can compute a round of AES in under three clock cycles. Performing the full cipher takes less than twenty clock cycles (on a processor running in excess of 3Ghz). No FPGA in the world can keep up with that performance.
        "AES Extensions" means that Intel put a dedicated instruction pipeline in the processor to compute AES. That means you now have a specialized purpose hardware encryption chipset built into your CPU, tada. Just like an FPU.
        Try the same Intel CPU with IA-32 instructions implementing AES, you won't do the whole cipher in 20 cycles. If you implement the exact same instruction architecture on an FPGA, it'll run at the slower clock of the FPGA, but still do it in 20 cycles. This means when you want to run
        
        Re: (Score:2)
        
        by smallfries ( 601545 ) writes:
        
        Your original point was that a reconfigurable processor would be more efficient at most tasks than a specialised processor, and that the big issue would be handling security. Why resort to car analogies when your entire argument can be summarised so concisely?
        I have made a simple enough argument that you seem to keep missing - while it is a nice theory that we can reconfigure chips to be more efficient for a particular task the actual practice doesn't live up to expectations. Reconfigurable architectures ha
        
        Re: (Score:2)
        
        by bluefoxlucid ( 723572 ) writes:
        
        My point is everyone responding when this was first posted had this idea that you can just "reconfigure your FPGA to be a new Intel CPU" by some magic, and it'll work better. This is a dumb and short-sighted idea; if you have reconfigurable hardware, you have the ability to ad-hoc create specialized gate hardware rather than run software on generic instruction set architectures.
        As for lard-cycles, somebody pointed out modern FPGAs clock at 1.5GHz; I'm more interested in what someone else said about the l
- Re: (Score:2)
  
  by Sulphur ( 1548251 ) writes:
  
  Once upon a time, there was a writable control store computer from TI.
  Some one wrote a Super Compiler that produced microcode directly instead of producing instructions as usual.
  Perhaps something like this can come back.
- Re: (Score:2)
  
  by owlstead ( 636356 ) writes:
  
  I could see this being used by a driver model. A generic driver is present that is able to reprogram the FPGA. Specialized or even derived drivers use the - now static - set of functionality. This could allow you to create generic purpose CPU's that can still be tweaked for certain tasks. It would also allow for upgrades of the algorithms being implemented. Symmetric cryptography and encoding/decoding would be obvious choices.
  If updating the FPGA is really slow, I would not try and let applications change t
- Re: (Score:1)
  
  by kyuubi ( 1355069 ) writes:
  
  I had the same idea, but from my time playing with them at university, I remember that they have a very limited number of write cycles. You can reprogram them enough to do your development and bugfixes, but you can't reprogram them every time you run a different type of application. Well, that is now. I suppose if someone can prove that it will make consumers happy that the engineers can find a way to increase the write limit.
- Re: (Score:2)
  
  by Muad'Dave ( 255648 ) writes:
  
  Yes. [uclinux.org]
A New Chip (Score:2)

by b4upoo ( 166390 ) writes:

Now we need a chip that can take any given problem and divide it into one thousand parts so we can feed it into these processors. -Gives me a headache!
- Re: (Score:2)
  
  by FeepingCreature ( 1132265 ) writes:
  
  Now we need a chip that can take any given problem and divide it into one thousand parts so we can feed it into these processors. -Gives me a headache!
  It's called a "programmer".
  - Re: (Score:2)
    
    by fpgaprogrammer ( 1086859 ) writes:
    
    Now we need a chip that can take any given problem and divide it into one thousand parts so we can feed it into these processors. -Gives me a headache!
    It's called a "programmer".
    It's called a "fpgaprogrammer"
    get it right.
- Re: (Score:2)
  
  by blair1q ( 305137 ) writes:
  
  For that you need a 1001-cpu chip.
Wow, this will end useful software! (Score:2)

by darthwader ( 130012 ) writes:

Software developers have barely figured out how to write single threaded algorithms without crashing. Now we are seeing more multithreaded algorithms with race conditions, deadlocks and other data-sharing bugs.
Can you imagine what will happen if every desktop machine has one or two FPGAs available for programs to use as needed?
PHB says "Hey, I've heard that you can make the program faster if you program custom hardware on the motherboard's FPGA. Get the new intern to write some FGPA code for our algorithm
- Re: (Score:2)
  
  by Nadaka ( 224565 ) writes:
  
  Multi threaded computing is not rocket science. Most bad multi threaded programming is bad because a lot of so called "software developers" just plain suck.
  - - Re: (Score:2)
      
      by Nadaka ( 224565 ) writes:
      
      Sorry, but to make parallelism painless, you have to restrict the language in ways that make a lot of other things painful.
      A language where every method call is a perfect closure is easily made parallel, the only question left is what granularity of parallelism will produce a gain when considering the overhead of managing threads. It also introduces a lot of overhead for constantly copying data on methods you are not going to be making parallel and rendering it slower for some to many applications when comp
And, to think... (Score:3)

by GodfatherofSoul ( 174979 ) writes: on Monday January 03, 2011 @04:24PM (#34747036)

Ten years ago some young 6-digit ID Slashdotter was getting modded down for suggesting a Beowulf cluster of cores. Who's laughing now, mods?!?!?

Share
twitter facebook
huh (Score:2)

by buddyglass ( 925859 ) writes:

Without digging for any additional information, it bugs me that this chip has 1000 cores and not 1024.
What about an all core chip? (Score:3)

by ka9dgx ( 72702 ) writes: on Monday January 03, 2011 @05:38PM (#34747828) Homepage Journal

The ultimate end to this trend is to build a system that is just core processing logic, with logic and memory all fused as closely as possible. I call it the BitGrid... it consists of 4bit look up tables hooked into an orthogonal grid. Because every single table can be used simultaneously, there is no Von Neuman bottleneck to worry about.
Petaflops... here we come.... !

Share
twitter facebook
- Re: (Score:2)
  
  by Jepler ( 6801 ) writes:
  
  You've just described the FPGA. Large areas of an FPGA are devoted to thousands of almost-identical functional blocks ("slices" in xilinx parlance). For instance, in one Xilinx family, a slice contains a 4-input LUT, a flip-flop (1 bit of memory, called an FF), and other specific gates that help implement things like carry chains, shift registers, and some 5+input functions the chip designers thought were commonly encountered.
  Other areas contain "block RAMs" and "DSP cores" (basically, dedicated multiplie
But can you play Doom on it? (Score:1)

by cephus440 ( 828210 ) writes:

Someone had to say it, be kind.
- Re: (Score:2)
  
  by Dachannien ( 617929 ) writes:
  
  Actually, I think the correct overused meme would be, "Imagine a Beowulf cluster of those!"
That's nothing. (Score:2)

by wcrowe ( 94389 ) writes:

Mine goes to 1011.
Amdahl's Law (Score:1)

by zildgulf ( 1116981 ) writes:

I think this is fantastic that a 1000-core processor is in development.
I hate to be the devil's advocate but at what point will Amdahl's Law take hold fully and adding more cores to a processor will prove to be a fruitless endeavor?
- Re: (Score:1)
  
  by Siffy ( 929793 ) writes:
  
  Depends on the problem(s) and the processor design, which is entirely the point of why a 1k core CPU is a big deal. If you can have enough independent problems or programs running all the time and design a system that lessens contention and fighting for resources, Amdahl's Law can be avoided almost indefinitely. It won't last forever of course.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Programmable CPU's (Score:3, Interesting)

Re:Programmable CPU's (Score:5, Informative)

Re: (Score:3)

Re: (Score:2)

Re: (Score:3)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:1)

1,000 cores (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Star Bridge Systems (Score:2)

Took long enough... (Score:3)

Re: (Score:3)

Re: (Score:1)

Does anyone have a link... (Score:3)

Life Cycle (Score:5, Interesting)

Re: (Score:2)

Re: (Score:1)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re:Life Cycle (Score:4, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:3)

Re: (Score:2)

Disappointment (Score:5, Funny)

YouTube Algorithm (Score:2)

Re: (Score:2)

FPGAs ... (Score:2)

Re: (Score:2)

first (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

good grief - maybe rediscover integration as well? (Score:1)

Security issues (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

A New Chip (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Wow, this will end useful software! (Score:2)

Re: (Score:2)

Re: (Score:2)

And, to think... (Score:3)

huh (Score:2)

What about an all core chip? (Score:3)

Re: (Score:2)

But can you play Doom on it? (Score:1)

Re: (Score:2)