Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Tilera To Release 100-Core Processor

timothy posted about 5 years ago | from the and-then-they-stopped-counting dept.

Hardware 191

angry tapir writes "Tilera has announced new general-purpose CPUs, including a 100-core chip. The two-year-old startup's Tile-GX series of chips are targeted at servers and appliances that execute Web-related functions such as indexing, Web search and video search. The Gx100 100-core chip will draw close to 55 watts of power at maximum performance."

Sorry! There are no comments related to the filter you selected.

This is great ! (5, Interesting)

ls671 (1122017) | about 5 years ago | (#29870063)

I can't wait to see the output of :

cat /proc/cpuinfo

I guess we will need to use:

cat /proc/cpuinfo | less

When we reach 1 million cores, we will need to rearrange the output of cat /proc/cpuinfo to eliminate redundant information ;-))

By the way I just typed "make menuconfig" and it wiil let you enter a number up to 512 in the "Maximum number of CPUs" field, so the Linux kernel seems ready for up to 512 CPUs (or cores, they are handled the same way by Linux it seems) as far I can tell by this simple test. Entering a number greater than 512 gives the "You have made an invalid entry" message ;-(

Note: You need to turn on "Support for big SMP systems with more than 8 CPUs" flag as well.

 

obligatory (2, Funny)

wisty (1335733) | about 5 years ago | (#29870089)

... and just imagine a Beowulf cluster of them.

Re:obligatory (2, Insightful)

fractoid (1076465) | about 5 years ago | (#29870253)

It IS a Beowulf cluster.

Obligatory Princess Bride quote:
Miracle Max: Go away or I'll call the brute squad!
Fezzik: I'm ON the brute squad.
Miracle Max: [opens door] You ARE the brute squad!

Re:obligatory (1)

AHuxley (892839) | about 5 years ago | (#29870349)

... and just imagine AT&T upgrading to them.

but we already have... (1)

AliasMarlowe (1042386) | about 5 years ago | (#29870715)

...a Beowulf cluster of stale memes.

Re:This is great ! (4, Informative)

MrMr (219533) | about 5 years ago | (#29870135)

The 'stock' kernel is ready for 512 cpu's. SGI had a 2048-core single image Linux kernel six years ago.

Re:This is great ! (2, Funny)

mrops (927562) | about 5 years ago | (#29870901)

But the more important question is...

Will it run Windows 7.

I know, I know, its the wrong questions, but the answer to the other one is always "yes".

Re:This is great ! (5, Insightful)

BadAnalogyGuy (945258) | about 5 years ago | (#29870149)

By the way I just typed "make menuconfig" and it wiil let you enter a number up to 512 in the "Maximum number of CPUs" field, so the Linux kernel seems ready for up to 512 CPUs (or cores, they are handled the same way by Linux it seems) as far I can tell by this simple test. Entering a number greater than 512 gives the "You have made an invalid entry" message

Whoa. If you change the source a little, you can enter 1000000 into the Maximum number of CPUs field! Linux is ready for up to a million cores.

If you change the code a little more, when I enter a number that's too high for menuconfig, it says "We're not talking about your penis size, Holmes"

Re:This is great ! (0)

ls671 (1122017) | about 5 years ago | (#29870257)

Whoa ! When I change apache httpd server code, it says "Microsoft IIS server" or anything I want when I type "httpd -v". I guess it's the same for anything for which you have the source code ;-))

More seriously, do you have any reference for "Linux is ready for up to a million cores" ?

Sources are always appreciated when you tell us something. I googled a little without finding anything on what you are talking about.

Thanks !

Re:This is great ! (1)

BadAnalogyGuy (945258) | about 5 years ago | (#29870527)

More seriously, do you have any reference for "Linux is ready for up to a million cores" ?

There was an article on Wikipedia that said so. And my local copy of the Linux kernel source has a comment that says so.

Re:This is great ! (1)

ls671 (1122017) | about 5 years ago | (#29870837)

Common! quit being such a tough guy and let us know where it says so...

grep -r "1,000,000" /usr/src/linux /usr/src/linux/drivers/net/qlge/qlge_ethtool.c: * We do this by using a basic thoughput of 1,000,000 frames per /usr/src/linux/kernel/cpuset.c: * per msec it maxes out at values just under 1,000,000. At constant

grep -ri "one million" /usr/src/linux /usr/src/linux/arch/x86/math-emu/README:found at a rate of 133 times per one million measurements for fsin. /usr/src/linux/arch/x86/math-emu/README:was obtained per one million arguments. For three of the instructions,

Re:This is great ! (1)

rbanffy (584143) | about 5 years ago | (#29870543)

"More seriously, do you have any reference for "Linux is ready for up to a million cores" ?"

SGI has 4096-core monsters, as MrMr pointed out.

Do you have a million-core machine we can use to invalidate this hypothesis?

Re:This is great ! (1)

TheRaven64 (641858) | about 5 years ago | (#29870951)

Those 4096 core SGI machines are clusters of 4-core machines with a very fast interconnect. Each cluster node runs its own local software with some quite evil stuff (custom memory controller and some extra logic in the VM subsystem for cache coherency across nodes) to handle distributed shared memory and process migration. These are not SMP machines and, although most of the relevant code is in the mainstream kernel sources, it is so tied to SGI's architecture that it is almost completely useless from the point of view of supporting other architectures. Compare this to something like a 64-processor Sun machine, which really is an SMP machine and you get very different performance characteristics.

When people describe them as single system image, they mean that they appear to userspace as being single machines, not that they are running a single instance of the kernel.

Re:This is great ! (4, Informative)

fluch (126140) | about 5 years ago | (#29870557)

Sources are always appreciated when you tell us something.

Here is the source: http://www.kernel.org/ [kernel.org]

Re:This is great ! (1)

Trepidity (597) | about 5 years ago | (#29870311)

And if you change the code a little more, it takes single-threaded tasks and automatically finds an efficient parallelization of them, distributing the work out to those million cores!

Re:This is great ! (2, Insightful)

am 2k (217885) | about 5 years ago | (#29870339)

Actually, some algorithms (like fluid simulation and a very large neural net) are not that hard to parallelize to run on a million cores.

Re:This is great ! (1)

Trepidity (597) | about 5 years ago | (#29870411)

Yes, but taking an arbitrary single-threaded algorithm and automatically figuring out what the parallelization is is the hard part. =]

Re:This is great ! (1)

am 2k (217885) | about 5 years ago | (#29870483)

Well, you could analyze the data dependencies and put them into a dependency graph, and then figure out what can be parallelized without having too much synchronization overhead. However, that's probably something for a theoretical scientific paper, and I'd be surprised if you could paralellize most algorithms to split to more threads than you could count on one hand...

As soon as you're doing linear I/O (like network access), you've hit a barrier anyways.

Re:This is great ! (2, Insightful)

dkf (304284) | about 5 years ago | (#29870431)

Actually, some algorithms (like fluid simulation and a very large neural net) are not that hard to parallelize to run on a million cores.

Building the memory backplane and communication system (assuming you're going for a cluster) to support a million CPUs is non-trivial. Without those, you'll go faster with fewer CPUs. That's why supercomputers are expensive (it's not in the processors, but in the rest of the infrastructure to support them).

Yep (5, Informative)

Sycraft-fu (314770) | about 5 years ago | (#29870989)

Unfortunately these days the meaning of supercomputer gets a bit diluted by many people calling clusters "supercomputers". They aren't really. As you noted what makes a supercomputer "super" isn't the number of processors, it is the rest, in particular the interconnects. Were this not the case, you could simply use cheaper clusters.

So why does it matter? Well, certain kinds of problems can't be solved by a cluster, just as certain ones can. To help understand how that might work, take something more people are familiar with like the difference between a cluster and just a bunch of computers on the Internet.

Some problems are extremely bandwidth non-intensive. They don't need no inter-node communication, and very little communication with the head node. A good example would be the Mersenne Prime Search, or Distributed.net. The problem is extremely small, the structure of the program is larger than the data itself. All the head node has to do is hand out ranges for clients to work on, and the clients only need to report the results, affirmative or negative. As such, it is something suited to work over the Internet. The nodes can be low bandwidth, they can drop out of communication for periods of time and it all works fine. Running on a cluster would gain you no speed over the same group of computers on modems.

However the same is not true for video rendering. You have a series of movie files you wish to composite in to a final production, with effects and so on. This sort of work is suited to a cluster. While the nodes can work independent, the work of one node doesn't depend on the others, they do require a lot of communication with the head node. The problem is very large, the video data can be terabytes. The result is also not small. So you can do it on many computers, but the bandwidth needs to be pretty high, with low latency. Gigabit Ethernet is likely what you are looking at. Trying to do it over the Internet, even broadband, would waste more time in data transfer than you'd gain in processing. You need a cluster.

Ok well supercomputers are the next level of that. What happens when you have a problem where you DO have a lot of inter-node communication? The result of the calculations on one node are influenced by the results on all others. This happens in things like physics simulations. In this case, a cluster can't handle it. You can slam your bandwidth but worse, you have too much latency. You spend all your time waiting on data, and thus computation speed isn't any faster.

For that, you need a supercomputer. You need something where nodes can directly access the memory of other nodes. It isn't quite as fast as local memory access, but nearly. Basically you want them to play like they are all the same physical system.

That's what separates a true supercomputer for a big cluster. You can have lots of CPUs and that's wonderful, there are a lot of problems you can solve on that. However that isn't a supercomputer unless the communication between nodes is there.

Re:This is great ! (1)

xtracto (837672) | about 5 years ago | (#29870921)

Actually, some algorithms (like fluid simulation and a very large neural net) are not that hard to parallelize to run on a million cores.

That may be because "Fluid simulation" can be done with a particle system, where each particle can be controlled by one core.

Similarly when developing artificial neural networks you could potentially put one "artificial neuron" in each core.

Another interesting distributed system paradigm is multi-agent systems. You could potentially put one "agent" (small program) in each core, with well defined rules of interaction and processing.

Re:This is great ! (1)

asaul (98023) | about 5 years ago | (#29871059)

How many cores does it take to run a parallel algorithm?

100 - 1 to do the processing, 1 to fetch the data and 98 to calculate an efficient way to make the whole thing run in parallel.

Re:This is great ! (0)

Anonymous Coward | about 5 years ago | (#29870629)

Could you send the patch?

Re:This is great ! (2, Funny)

tomhath (637240) | about 5 years ago | (#29871239)

Whoa. If you change the source a little, you can enter 1000000 into the Maximum number of CPUs field! Linux is ready for up to a million cores.

640K cores is more than anyone will ever need.

Re:This is great ! (2, Funny)

jellomizer (103300) | about 5 years ago | (#29871341)

No you really need 16,711,680 cores. So you have one core for every cell in a standard Excel 2003 sheet (Yea I know 2007 finally gave us more space)

So 65,536 rows by 255 columns. A CPU for each sell processing its own value. Excel may almost run fast.

 

Re:This is great ! (4, Informative)

Bert64 (520050) | about 5 years ago | (#29870535)

The information in cpuinfo is only redundant like that on x86/amd64...
On Sparc or Alpha, you get a single block of text where one of the fields means "number of cpus", example:

cpu : TI UltraSparc IIi (Sabre)
fpu : UltraSparc IIi integrated FPU
prom : OBP 3.10.25 2000/01/17 21:26
type : sun4u
ncpus probed : 1
ncpus active : 1
D$ parity tl1 : 0
I$ parity tl1 : 0
Cpu0Bogo : 880.38
Cpu0ClkTck : 000000001a3a4eab
MMU Type : Spitfire

number of cpus active and number of cpus probed (includes any which are inactive)... a million cpus wouldn't present a problem here.

Re:This is great ! (4, Insightful)

TheRaven64 (641858) | about 5 years ago | (#29870975)

And this is one of the reasons why Linux is such a pain to program for. If you actually want any of this information from a program, you need to parse /proc/cpuinfo. Unfortunately, every architecture decides to format this file differently, so porting from Linux/x86 to Linux/PowerPC or Linux/ARM requires you to rewrite this parser. Contrast this with *BSD, where the same information is available in sysctls, so you just fire off the one that you want (three lines of code), don't need a parser, and can use the same code on all supported architectures. For fun, try writing code that will get the current power status or number and speed of the CPUs. I've done that, and the total code for supporting NetBSD, OpenBSD, FreeBSD and Solaris on all of their supported architectures was less than the code for supporting Linux/x86 (and doesn't work on Linux/PowerPC).

Re:This is great ! (0)

Anonymous Coward | about 5 years ago | (#29870585)

cat /proc/cpuinfo | less

I guess you can also use

less /proc/cpuinfo

Re:This is great ! (1)

ls671 (1122017) | about 5 years ago | (#29870707)

Nah!, I am lazy... when I realize the file is to big, it is faster for me to add the pipe at the end of the line than to edit the beginning of the line ... ;-)

Re:This is great ! (1)

somersault (912633) | about 5 years ago | (#29870779)

that's what the 'home' key is for :p

Re:This is great ! (1)

1s44c (552956) | about 5 years ago | (#29870615)

cat /proc/cpuinfo | less

That gets modded interesting these days? The use of a pipe?

If that's not too basic to be considered interesting then moderators have got a odd idea about what interesting actually means.

Re:This is great ! (1)

RichardJenkins (1362463) | about 5 years ago | (#29870669)

Pipes: Not just for hitting any more.

Re:This is great ! (1)

TheRaven64 (641858) | about 5 years ago | (#29870979)

It's interesting that even in 2009 on a site for geeks, many people seem not to know about cat abuse and would still rather spawn two processes to do the job of one.

Re:This is great ! (1)

glgraca (105308) | about 5 years ago | (#29870711)

When we reach 1 million cores, we'll probably be able to ask the computer what's on his mind...

Re:This is great ! (0)

Anonymous Coward | about 5 years ago | (#29871031)

Get with the times - was doing that a year ago with psrinfo on a Sun T5240 (128 threads). Have not got my hands on a T5440 yet though... 256 threads.

imagine a beowulf... (0, Redundant)

gandhi_2 (1108023) | about 5 years ago | (#29870079)

...cluster of natalie pormemes.

OOOoooo! BABY LIGHT MY FIRE !! (0)

Anonymous Coward | about 5 years ago | (#29870091)

Yeah, baby !! That's a LOT OF POWER to turn my knobs !!

Awfully generous with the term "core" (2, Insightful)

BadAnalogyGuy (945258) | about 5 years ago | (#29870097)

Yes, I suppose technically any FPGA could be considered a "core" in its own right, but it's a far cry from the CPU cores that you typically associate with the term.

Putting a stock on a semi-automatic rifle makes it an "assault weapon", but c'mon. It's still a pea shooter.

You obviously know nothing, Shultzie !! (-1, Offtopic)

Anonymous Coward | about 5 years ago | (#29870735)

You are one cunfused freetard !! Every rifle has a stock. What you add is a full-auto switch, a 10+ round clip, and throw in a pistol grip for fun. Then you could have made sense. But from a freetard, I didn't expect you to make sense.

Lock n load, but then unlock you dumb mofo, mofo !! You can't fire with a locked bolt, dumb mofo, mofo !!

Re:Awfully generous with the term "core" (0)

Anonymous Coward | about 5 years ago | (#29871191)

There's also another potential problem. All of these 100 "cores" share an extremely small amount of cache.

32K L1i cache, 32K L1d cache, 256K L2 cache per tile

When does a CPU become the CPU? (5, Interesting)

LaurensVH (1079801) | about 5 years ago | (#29870107)

It appears from the article that it's a new, separate architecture to which the kernel hasn't been ported yet, so these are add-on processors that can help reduce the load on the actual CPU, at least for now. So, em, two things: 1. How exactly does that work without kernel level support? They claimed having ported separate apps (MySQL, memcached, Apache), so this might suggest a generic kernel interface and userspace scheduling. 2. How does this fix the apps they ported being mostly IO bound in a lot of cases and 99% of the cores will still just be eating out of their noses?

Re:When does a CPU become the CPU? (4, Interesting)

broken_chaos (1188549) | about 5 years ago | (#29870125)

How does this fix the apps they ported being mostly IO bound in a lot of cases and 99% of the cores will still just be eating out of their noses?

Loads and loads of RAM/cache, possibly?

Re:When does a CPU become the CPU? (4, Informative)

drspliff (652992) | about 5 years ago | (#29870823)

The Register [channelregister.co.uk] goes into more detail than this article, as usal.

The Tile-Gx chips will run the Linux 2.6.26 kernel and add-on components that make it an operating system. Apache, PHP, and MySQL are being ported to the chips, and the programming tools will include the latest GCC compiler set. (Three years ago, Tilera had licensed SGI's MIPS-based C/C++ compilers for the Tile chips, which is why I think Tilera has also licensed some MIPS intellectual property to create its chip design, but the company has not discussed this.)

So it seems pretty standard and they're using existing open & closed source MIPS toolchains, however there's still "will" and "are being" in that sentence which brings a little unease...

Re:When does a CPU become the CPU? (0)

Anonymous Coward | about 5 years ago | (#29871085)

This company is probably in its death bed. Engineering cannot save it, but business sense and market hype may. The chip is a viable bit of technology. The revolution in its design is based on the fact that it can have massive parallelism efficiently. Originally, however, the company was aiming at scientific computing. That is they were looking to replace clusters and similar things. The problem is that their chips were less capable of grining through the math. With several processors, however, taking a few more cycles to do each multiply is find if it can do dozens of them at the same time. It did not sell me as I went with a traditional cluster solution.

I am not sure if the new generation of chip solves the math prolem, but this tid bit sounds like they are just dodging it. The chip may be useful as some kind of VPN or HTTPS accelerator, but those already exist on the market.

Custom ISA? (4, Insightful)

Henriok (6762) | about 5 years ago | (#29870111)

Massive amounts or cores are cool and all that, but if the instruction set isn't any standard type (ie x86, Sparc, ARM, PowerPC or MIPS) chances are that it won't see light outside highly customized applications. Sure, Linux will probably run it. Linux run on anything, but it won't be put in a regular computer other than as an accelerator of some sort, like GPUs which are massively multicore too. Intel's Larrabee though..

Re:Custom ISA? (2, Informative)

EsbenMoseHansen (731150) | about 5 years ago | (#29870195)

In general, new instruction sets are mostly interesting in the custom software and the open source software areas. But the latter is quite a large chunk of the server market, so I suppose they could live with that.

They would need to get support into gcc first, though.

Re:Custom ISA? (4, Informative)

stiggle (649614) | about 5 years ago | (#29870265)

From a quick Google - its based on the ARM core (easily licensable cpu core)

Re:Custom ISA? (1)

bertok (226922) | about 5 years ago | (#29870377)

From a quick Google - its based on the ARM core (easily licensable cpu core)

Must be a coincidence, but I was just thinking a week ago why nobody's tried to make a many-core CPU by doing a cookie-cutter job and just replicating a simple ARM core a bunch of times... looks like someone has!

Re:Custom ISA? (5, Informative)

ForeverFaithless (894809) | about 5 years ago | (#29870447)

Wikipedia claims [wikipedia.org] it's a MIPS-derived VLIW instruction set.

Re:Custom ISA? (1)

taniwha (70410) | about 5 years ago | (#29870493)

64-bit VLIW instructions, 2 ALUs, 1 load store unit (3 ops/clock) I'm going to guess 32 registers (ala MIPS) - that means 3+3+2=8x(log2 32 = 5) = 40 bits to encode registers 8+8+8 to encode opcodes which seems maybe too many - perhaps 64 registers 48 bits of regs and 16 of opcodes?

no FPU though sadly

Re:Custom ISA? (1)

rbanffy (584143) | about 5 years ago | (#29870551)

You can always offload your number crunching to a GPU with OpenCL...

Re:Custom ISA? (0)

Anonymous Coward | about 5 years ago | (#29870875)

no FPU though sadly

I imagine with 100 cores, allocating a handful of them as a SoftFPU would not be a major problem.

Re:Custom ISA? (1)

Locutus (9039) | about 5 years ago | (#29870765)

good one. I browsed the article for what arch it was and was expecting ARM but didn't see it stated. ARM makes sense and the 40nm process has me wondering if it's Cortex a5 or a9 based.

how about those in some netbooks and a beowulf cluster of those? ;-)

LoB

Re:Custom ISA? (3, Informative)

Narishma (822073) | about 5 years ago | (#29870879)

Why was this modded Informative? Can we have any links? Because both the article here as well as Wikipedia and an old Ars Technica story claim that it's based on MIPS.

Re:Custom ISA? (3, Insightful)

complete loony (663508) | about 5 years ago | (#29870297)

1. LLVM backend
2. Grand central
3. ???
4. Done.

Seriously though, this is exactly what Apple have been working towards recently in the compiler space. You write your application and explicitly break up the algorythm into little tasks that can be executed in parallel. Using a syntax that is light weight and expressive. Then your compiler tool chain and runtime JIT manages the runtime threads and determines which processor is best equipped to run each task. It might run on the normal CPU, or it might run on the graphics card.

FreeBSD and GCD (3, Interesting)

MacTechnic (40042) | about 5 years ago | (#29870407)

Although I don't expect Apple to release an Apple Server edition with a Tilera multicore processor, I would be interested to see a version of FreeBSD running with Grand Central Dispatch on a Tilera multicore chip. It would give a good idea of how effective GCD would be in allocating cores for execution. Any machine with 100 cores must have a considerable amount of RAM, perhap 8GB+, even with large caches.

Apple has been very active in developing LLVM compilers, and has recently added CLANG front end, instead of GCC. I don't think apple has open sourced all their work yet, but check llvm.org for the current details. The real trick is breaking any algorithm into blocks. Using OpenCL to organize your code for execution. I mean how different is a 100 core multi-CPU chip from a multicore GPU accellerator!

Re:FreeBSD and GCD (1)

TheRaven64 (641858) | about 5 years ago | (#29871001)

Grand Central is nice and buzzwordy, but it's still based on threads and shared memory, so it works best when you have shared cache, or you will end up wasting a lot of time with cache coherency protocols. Erlang or OpenCL are much better fits for this kind of architecture.

Oh, and the version of clang that Apple ships as 1.0 is a branch from the main tree from a few weeks before the official 1.0 release was branched. Apple puts a lot of developer effort into clang, but so do other people (including myself). This work is all open source and developed in a public repository, it is not some super secret Apple project.

Re:Custom ISA? (1)

Nursie (632944) | about 5 years ago | (#29870895)

"Seriously though, this is exactly what Apple have been working towards recently in the compiler space. You write your application and explicitly break up the algorythm into little tasks that can be executed in parallel. Using a syntax that is light weight and expressive. Then your compiler tool chain and runtime JIT manages the runtime threads and determines which processor is best equipped to run each task."

AAAAAAAAHHHHH!!!! It's the iPod all over again! Apple did not invent the thread pool! I'm sure Grand central is great but FFS!

"Seriously though, this is exactly what Software Engineers have been working with for years in the thread pool pattern. You write your application and explicitly break up the algorithm into little tasks that can be executed in parallel. Using the language of your choice. Then your Operating System manages the runtime threads and determines which processor is best equipped to run each task.

FTFY. Thread pools are not new. Hell, I wrote a thread pool implementation 10 years ago and it wasn't new then.

Re:Custom ISA? (1)

complete loony (663508) | about 5 years ago | (#29870923)

No it wasn't that new. But what is new is a common low level language representation that can be optimised in that form before being targetted to the different archetectures that are present in the same machine. It also helps that there is a single machine level daemon managing the tasks that run on those threads.

Re:Custom ISA? (1)

Linker3000 (626634) | about 5 years ago | (#29870299)

"...if the instruction set isn't any standard type..."

No problem; use the processor for a 'speak and spell'-type toy, a drug store reusable digital camera or a scientific calculator and someone will hack a decent Linux kernel onto it over a weekend.

Re:Custom ISA? (1)

V!NCENT (1105021) | about 5 years ago | (#29870325)

GPU's are not massively multicored! That's marketing speak...

Re:Custom ISA? (1)

rbanffy (584143) | about 5 years ago | (#29870553)

They have a C compiler. That's all we need.

55 Watts! (-1, Offtopic)

conureman (748753) | about 5 years ago | (#29870241)

I guess I gotta RTFA; Man it's past my bedtime.

100? (2, Insightful)

nmg196 (184961) | about 5 years ago | (#29870247)

Wouldn't it have been better to make it a power of 2? Some work is more easily divided when you can just keep halving it. 64 or 128 would have been more logical I would have thought. I'm not an SMP programmer thought, so perhaps it doesn't make any difference.

Re:100? (0)

Anonymous Coward | about 5 years ago | (#29870283)

If it's ported to Apache it could be interesting.

Re:100? (0)

Anonymous Coward | about 5 years ago | (#29870285)

It boils down to have much space you have on the die, which is usually square or a rectangle where width is twice the length. Perhaps it's 100 cores, and the cache and interconnects takes up about 28 times a core. Just a wild ass guess.

Re:100? (5, Funny)

Fotograf (1515543) | about 5 years ago | (#29870289)

it does if you are carefully starting applications in power of two and designing your applications to use power of two threads.

Re:100? LOL (1)

CFD339 (795926) | about 5 years ago | (#29870633)

Wish I had mod points today. I wonder how many people will get just how funny this fantastically sarcastic and totally on target comment was. Bravo.

Re:100? (4, Informative)

harry666t (1062422) | about 5 years ago | (#29870327)

SMP FAQ.

Q: Does the number of processors in a SMP system need to be a power of two/divisible by two?

A: No.

Q: Does the number of processors in a SMP system...

A: Any number of CPUs/cores that is larger than one will make the system an SMP system*.

(* except when it's an asymmetrical architecture)

Q: How do these patterns (power of 2, divisible by 2, etc) of numbers of cores affect performance?

A: Performance depends on the architecture of the system. You cannot judge by simply looking at the number of cores, just as you can't simply look at MHz.

Re:100? (5, Funny)

glwtta (532858) | about 5 years ago | (#29870573)

Their plan is to eventually confuse consumers by advertising "X KiloCores! (* KC = 1000 cores)" when everyone expects a KiloCore to be 1024 cores.

Re:100? (1)

TheRaven64 (641858) | about 5 years ago | (#29871009)

It doesn't need to be a power of two, but being a square number helps for this kind of design because you want a regular arrangement that can fit into a regular grid on the die.

crossbars (0)

Anonymous Coward | about 5 years ago | (#29870255)

in the article it is mentioned that Tilera is able to avoid the use of crossbars:

For faster data exchange, Tilera has organized parallelized cores in a square with multiple points to receive and transfer data. Each core has a switch for faster data exchange. Chips from Intel and AMD rely on crossbars, but as the number of cores expands, the design could potentially cause a gridlock that could lead to bandwidth issues, he said.

Does anybody here know how this actually works?

Re:crossbars (1)

TheRaven64 (641858) | about 5 years ago | (#29871023)

I vaguely remember reading about their design a while ago, and I seem to recall that they basically use a store-and-forward design. Each core only talks to the cores close to it directly, and these relay requests to further away ones (a lot like how AMD chips work, having copied the design from the Alpha). This adds a little bit of complexity to scheduling, because you want to keep processes that share memory on cores that are close together for best performance.

Sounds Like (1)

Nerdfest (867930) | about 5 years ago | (#29870345)

Sounds like something that might be useful in a video game console ...

But does it run linux? (1)

conureman (748753) | about 5 years ago | (#29870655)

In TFA sez it's ported to apache. Might be useful.

What happened to powers of 2? (1)

Godefricus (1575165) | about 5 years ago | (#29870387)

.. I'm I the only one who gets mildly suspicious when reading 100-core instead of 128-core?

Re:What happened to powers of 2? (1)

atilla filiz (1402809) | about 5 years ago | (#29870427)

I think it's all about how many they can squeeze into a single chip, considering cost/power/performance. You don't need to have 2^n, just to fill in your address space.

Re:What happened to powers of 2? (0)

Anonymous Coward | about 5 years ago | (#29870451)

I assume not, but it's a silly response. Personally, I find 128-cores strange, since you can't lay them out on a square die like you can 100 (10x10) or 64 (8x8).

Re:What happened to powers of 2? (1)

Shikaku (1129753) | about 5 years ago | (#29870577)

11x11 + 2x2 + 1x1 + 1 layers of cpus.

The third dimension called, they are suing flatland for prior art and copyright infringement.

Re:What happened to powers of 2? (1)

Culture20 (968837) | about 5 years ago | (#29871025)

The third dimension called, they are suing flatland for prior art and copyright infringement.

The fourth dimension called, they already have (wioll haven) the judgment from the lawsuit, and flatland stands (willan on-stand) on parody.

Re:What happened to powers of 2? (1)

marquis111 (94760) | about 5 years ago | (#29871113)

Doctor Dan Streetmentioner called, and he wants royalties for your use of his tenses!

Re:What happened to powers of 2? (0)

Anonymous Coward | about 5 years ago | (#29871117)

11x11 + 2x2 + 1x1 + 1 layers of cpus.

apparently unclear on how 1x1 works. 11x11+2x2+1x1+1 cores is 127 cores.

Re:What happened to powers of 2? (1)

Skapare (16644) | about 5 years ago | (#29871093)

Where's the law that says the core layout, or even the die itself, has to be square? Square, or nearly square, might be the most convenient for minimum paths and such. Still, you need to have space somewhere for "between core" control circuits. Even if you lay out the die in a nice square grid, you don't have to make each cell be a core. Getting data lines into the cores in the middle can be an interesting challenge. But then, 100 cores trying to load a word from different locations in RAM all at the same time might be a bit congested. I'd suggest some internal RAM in place of some cores.

Re:What happened to powers of 2? (1)

JasterBobaMereel (1102861) | about 5 years ago | (#29870653)

100 cores plus some room on the chip for management, connections, global cache etc ....

Plus if you say 100 cores and put 128 cores on the chip then 28 can fail before you have to bin the chip as a dud ....

Re:What happened to powers of 2? (1)

Rockoon (1252108) | about 5 years ago | (#29871111)

If it was an attempt at 128 cores, some of them would come off the fab with no defects and would be sold as 128's...

They arent going to intentionally roast up to 28 cores on every unit just to hit their advertised number.

Power of two is not at all necessary (1)

Sycraft-fu (314770) | about 5 years ago | (#29870939)

It is done only out of convince really. So you have your regular 1 core processor of course (2^0), next step up is a second core (2^1). Now from there, an easy step is to simply duplicate your dual core setup. You just make a second copy and put it on the same chip giving you 4 cores (2^2). This is as far as most chips go, more than 4 cores is not real common. However you might notice we have a real small sample set, we've only covered 3 powers of two, two of them by necessity. This trend thus isn't one because computers require it, just because it works out that way.

So, if you sniff around, you discover that indeed AMD makes 3 core processors. They are called the Phenom X3. Basically what happens is they designed a quad core chip. however they are having yield problems. Often enough, one of the cores fails testing, but the others work. So what they do is disable that core, and sell a 3 core product. End result works great, the OS sees 3 CPUs and uses them.

OSes don't care about specifics in terms of core numbers. Power of two core numbers are just the way it has worked out in many chips so far because we aren't dealing with large numbers. It is going to quickly go away though. Intel is going to introduce a 6-core chip next year. We are heading towards a market that will have processors with a number of cores that is convenient. What "convenient" is will depend on a lot of factors, but the divisibility of the numbers won't be one of them.

We may well start to see more odd numbered CPUs. If you design something with 100 individual units, it is much easier to disable parts if they don't work. Might see 96, 97, 98, 99, and 100 core varieties or something like that. All the same chip, just with units disabled if they fail.

GPUs have been doing this for years. They are highly parallel and often when a new high end part comes out there'll be a slightly lower end part that is a bit lower clock and with one or two of the pipelines disabled. This allows for parts that won't pass all the tests, but still mostly work, to be sold rather than thrown out.

Am I *actually*... (0)

Anonymous Coward | about 5 years ago | (#29870521)

...the first person to ask if this can run "Crysis?"

Re:Am I *actually*... (0)

Anonymous Coward | about 5 years ago | (#29870793)

It might run Crysis. Just
But to actually run Windows and Crysis and not need to kill IE first you might need 4 of these.

What ISA? (1)

abdulla (523920) | about 5 years ago | (#29870641)

Are these x86/x86-64 CPUs? It wasn't particularly clear to me.

Re:What ISA? (2, Informative)

Narishma (822073) | about 5 years ago | (#29870917)

No, they are derived from the MIPS architecture.

Been there, done that, got the T-Shirt... (5, Interesting)

Anonymous Coward | about 5 years ago | (#29870661)

OK, so big disclaimer: I work for Sun (not Oracle, yet!)

The Sun Niagara T1 chip came out over 3 years ago, and it did 32 threads on 8 cores.
And drew something around 50W (200W for a fully-loaded server). And under $4k.

The T2 systems came out last year, do 64 threads/CPU for a similar power budget. And even less $/thread.

The T3 systems likely will be out next year (I don't know specifically when, I'm not In The Know), and the threads/chip should double again, with little power increase.

Of course, per-thread performance isn't equal to anything like a modern "standard" CPU. Though, it's now "good enough" for most stuff - the T2 systems have a per-thread performance equal to about the old Pentium3 chips. I would be flabbergasted if this GX chip had a per-core performance anywhere near that.

I'm not sure how Intel's Larabee is going to show (it's still nowhere near release), but the T-seres chips from Sun are cheap, open, and available now. And they run Solaris AND Linux. So unless this new GX chip is radically more efficient/higher-performance/less costly, I don't see this company making any impact.

-Erik

It would be clever (2, Insightful)

rbanffy (584143) | about 5 years ago | (#29870763)

Since a) developing a processor is insanely expensive and b) they need it to run lots of software ASAP, it would be very clever if they spent a marginal part of the overall development costs in making sure every key Linux and *BSD kernel developer gets some hardware they can use to port the stuff over. Make it a nice desktop workstation with cool graphics and it will happen even faster.

They are going up against Intel... The traditional approach (delivering a faster processor with a better power consumption at a lower price) simply will not work here.

I think Movidis taught us a lesson a couple years back. Users will not move away from x86 for anything less than a spectacular improvement. Even the Niagara SPARC servers are a hard sell these days...

Re:It would be clever (0)

Anonymous Coward | about 5 years ago | (#29870835)

They already have a Linux version for this along with a GCC to compile for it...

Chips target tasks (1)

Decameron81 (628548) | about 5 years ago | (#29870775)

The two-year-old startup's Tile-GX series of chips are targeted at servers and appliances that execute Web-related functions such as indexing, Web search and video search.

Can someone explain to me how a chip can be targetted at much higher-level tasks like these?

I realize there are surely technical means to achieve this goal, I just can't imagine myself what these means could be.

Re:Chips target tasks (1)

TheRaven64 (641858) | about 5 years ago | (#29871055)

There is not really such a thing as a general purpose CPU. Any CPU with a few features (add, conditional branch) can run any algorithm, but can't necessarily run it fast. Different applications have different instruction mixes. The kind of code that GPUs are designed to run, for example, places very high demands on memory throughput and floating point performance, but is relatively sparse in terms of branches and integer operations. On average, most programs have a branch every 7 instructions, but GPU code typically runs for a few hundred instructions between conditional branches. Web serving generally uses no floating point instructions and is throughput - as opposed to latency - sensitive, and requires high degrees of concurrency. A processor that can achieve these trades (e.g. Sun's T series) will serve web pages very well, but will do a lot worse at, for example, CAD.

Re:Chips target tasks (1)

Skapare (16644) | about 5 years ago | (#29871121)

An associative memory requirement could be better served by a custom high-core count, CPU ... if it has sufficient memory on board (e.g. sufficient total memory bus bandwidth).

hmm... (3, Funny)

Skizmo (957780) | about 5 years ago | (#29870973)

100 cores... that means that my cpu will never go beyond '1% busy'

And... (-1, Redundant)

travbrad (622986) | about 5 years ago | (#29871155)

it still can't run Crysis at 60FPS :p
Load More Comments
Slashdot Login

Need an Account?

Forgot your password?