Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Australia's CSIRO To Launch CPU-GPU Supercomputer

timothy posted more than 4 years ago | from the ready-for-duke-nuken-forever dept.

Supercomputing 82

bennyboy64 contributes this excerpt from CRN Australia: "The CSIRO will this week launch a new supercomputer which uses a cluster of GPUs [pictures] to gain a processing capacity that competes with supercomputers over twice its size. The supercomputer is one of the world's first to combine traditional CPUs with the more powerful GPUs. It features 100 Intel Xeon CPU chips and 50 Tesla GPU chips, connected to an 80 Terabyte Hitachi Data Systems network attached storage unit. CSIRO science applications have already seen 10-100x speedups on NVIDIA GPUs."

cancel ×

82 comments

Sorry! There are no comments related to the filter you selected.

fr1st (-1, Offtopic)

Anonymous Coward | more than 4 years ago | (#30200296)

p1zz0t

lollero (0)

Anonymous Coward | more than 4 years ago | (#30200316)

Why is it "more powerful" than "traditional" CPUs?

And why is it not under my hood already if it is superior technology?

Stating the obvious, but... (3, Informative)

Sockatume (732728) | more than 4 years ago | (#30200404)

Graphics processing, the technically demanding part of PC gaming, uses GPUs essentially exclusively. Physics processing, the runner-up, can already be loaded off to technically-similar PPUs, or even actual GPUs working as physics processors. The reason that most apps run on the CPU is that it's easier to write for, not that most apps actually run better on it for some fundimental reason.

Re:Stating the obvious, but... (5, Interesting)

Sockatume (732728) | more than 4 years ago | (#30200418)

Okay, that's not quite true, most tasks benefit from piddling about on the CPU, but demanding tasks would be better off running on something faster and more specialised. The barrier to that is that it's harder to write GPGPU code.

Re:Stating the obvious, but... (4, Insightful)

Stratoukos (1446161) | more than 4 years ago | (#30200732)

The reason that most apps run on the CPU is that it's easier to write for, not that most apps actually run better on it for some fundimental reason.

Well that's not exactly true. Of course frameworks for writing programs that utilize the gpu are still on their infancy, but that doesn't mean that all problems are suited for the gpu. Problems that are best solved by the gpu are problems that can be parallelised. I am not exactly sure what do you mean when you say most apps, but if you are talking about apps typicaly found on a desktop that simply isn't true.

The fundamental reason is that gpus are really good at doing the same thing on different sets of data. For example you can send an array of 1000 ints and tell the gpu to calculate and return their square or something similar. The reason for this is that when gpus are used for graphics they usually have to do the same operation on all the pixels on the screen, and they evolved to be good at that. I cannot see how this is useful for desktop applications, especially if you consider the massive cost of accessing data on main memory from the gpu.

Conceded (0)

Sockatume (732728) | more than 4 years ago | (#30200826)

There are indeed tasks that don't parallelise well. My brain's filed them as unimportant, but that's likely due to the difficulty in doing computational work that parallelises poorly rather than some fundimental deficiency. A better way of putting it would be to say that most hard-core research computing is done in a manner that's very similar to hard-core gaming computing, so it's actually a very sensible transition.

Re:Stating the obvious, but... (0)

Anonymous Coward | more than 4 years ago | (#30200756)

Informative my ass. GPUs work well on problems that don't need to share large amounts of memory between tasks, and each workload can fit inside the local high speed ram of the graphics/tesla card.

Most apps run on general purpose CPUs because it's much cheaper due to mass production.

NVidia hopes it can make enough money selling "tesla" cards at a few thousand dollars a pop to make up for the extra cost on horrible yields for huge graphics chips. Essentially they are making gamers pay for their research into high performance computing until that happens.

Well... (0)

Anonymous Coward | more than 4 years ago | (#30200770)

It depends.

Mod me up, pls.

Re:Stating the obvious, but...Christmas gifts (-1, Offtopic)

coolforsale100 (1684556) | more than 4 years ago | (#30201806)

http://www.coolforsale.com/ [coolforsale.com] Dear ladies and gentlemen Hello, In order to meet Christmas, Site launched Christmas spree, welcome new and old customers come to participate in the there are unexpected surprises, look forward to your arrival. Only this site have this treatmentOur goal is "Best quality, Best reputation , Best services". Your satisfaction is our main pursue. You can find the best products from us, meeting your different needs.Ladies and Gentlemen weicome to my coolforsale.com.Here,there are the most fashion products . Pass by but don't miss it.Select your favorite clothing! Welcome to come next time ! Thank you! ugg boot,POLO hoody,Jacket, Air jordan(1-24)shoes $33 Nike shox(R4,NZ,OZ,TL1,TL2,TL3) $35 Handbags(Coach lv fendi d&g) $35 Tshirts (Polo ,ed hardy,lacoste) $16 free shipping Thanks!!! Advance wish you a merry Christmas.

Re:lollero (4, Informative)

JorDan Clock (664877) | more than 4 years ago | (#30200444)

You already have the technology in your box. The difference is, in the past couple of years, GPUs that were once used exclusively to speed up rendering have become more and more generalized on the hardware and instruction set level to the point where they are a very attractive method of speeding up things other than rendering. Physics simulations, such as fluid dynamics, are much faster on a GPU than a CPU. I currently run the GPU client of Folding@Home and it outperforms the CPU client by orders of magnitude.

The hardware has been around for quite some time, but now we're realizing all the things a GPU can do besides run pretty games faster.

Re:lollero (3, Informative)

XDirtypunkX (1290358) | more than 4 years ago | (#30200740)

It's only traditional on very particular workloads that are very parallel, use a lot of floating point and has a largely coherent execution pattern/memory access. The CPU is still the king of general computing tasks that have lots of incoherent branches, indirection and that require serialized execution.

Re:lollero (1)

XDirtypunkX (1290358) | more than 4 years ago | (#30200752)

*only more powerful than traditional...

Can someone explain... (1, Interesting)

bluesatin (1350681) | more than 4 years ago | (#30200330)

Can someone explain exactly what the benefits/drawbacks of using GPUs for processing?

It would also be nice if someone could give a quick run down of what sort of applications GPUs are good at.

Re:Can someone explain... (5, Informative)

SanguineV (1197225) | more than 4 years ago | (#30200366)

Can someone explain exactly what the benefits/drawbacks of using GPUs for processing?

GPUs are massively parallel handling hundreds of cores and tens of thousands of threads. The drawbacks are they have limited instruction sets and don't support a lot of the arbitrary jumping, memory loading, etc. that CPUs do.

It would also be nice if someone could give a quick run down of what sort of applications GPUs are good at.

Anything that is massively parallelisable and processing intensive. The usual bottle neck with GPU programming in normal computers is the overhead of loading from RAM to GPU-RAM. Remove this bottleneck in a custom system and you can have enormous speed ups in parallel applications once you compile the code down to GPU instructions.

Greater detail I will leave to the experts...

SIMD (1)

reporter (666905) | more than 4 years ago | (#30200464)

SanguineV (1197225) wrote, "GPUs are massively parallel handling hundreds of cores and tens of thousands of threads. The drawbacks are they have limited instruction sets and don't support a lot of the arbitrary jumping, memory loading, etc. that CPUs do."

In other words, the GPU is a single-instruction-multiple-data (SIMD) device. It matches well simple, regular computations like that which occurs in digital signal processing, image processing, computer-generated graphics, etc.

The modern-day GPU is the difference between "Asteroids" (a video game from the 1980s) and Unreal Tournament 2004 (an intense 3D-graphics game of the 21st century).

Re:SIMD (1)

megrims (839585) | more than 4 years ago | (#30200510)

The modern-day GPU is the difference between "Asteroids" (a video game from the 1980s) and Unreal Tournament 2004 (an intense 3D-graphics game of the 21st century).

Sorry, you need to tie that comparison to something. What did you mean?

Re:SIMD (0)

Anonymous Coward | more than 4 years ago | (#30200686)

a CPU is like a car, a GPU is like a train. If you need lots of data to go through the same code path, a train works way better. If every piece of data follows different paths, the CPU works better.

Re:SIMD (5, Funny)

MrNaz (730548) | more than 4 years ago | (#30200724)

You lost me there, your car analogy contains a train, which threw me off track.

Re:SIMD (1)

turing_m (1030530) | more than 4 years ago | (#30201010)

Nice.

Re:SIMD (1)

ChienAndalu (1293930) | more than 4 years ago | (#30200828)

Well, at least in Half Life I could always select "Software" as a rendering method. It wasn't nice, but it didn't look like "Asteroids".

Re:Can someone explain... (3, Interesting)

Anonymous Coward | more than 4 years ago | (#30200526)

The main drawback of using GPUs for scientific applications is their poor support for double precision floating point operations.

Using single precision mathematics makes sense for games, where it doesn't matter if a triangle is a few millimetres out of place or the shade of a pixel is slightly wrong. However, in a lot of scientific applications these small errors can build up and completely invalidate results.

There are rumours that Nvidias next generation Fermi GPU will support double precision mathematics at the same speed as single precision. If this is the case then they will be incredibly popular within the scientific community and I would expect the top500 supercomputer list will become dominated by machines built around GPUs rather then traditional CPUs. (Of course this is really dependant on the Fermi GPUs FLOPS per Watt performance which is impossible to gauge before they are released).

Re:Can someone explain... (1, Interesting)

Anonymous Coward | more than 4 years ago | (#30200726)

Note that doubles take... er... double bandwidth as the equivalent single values. GPUs are, afaik, pretty bandwidth intensive... so even if the operators are supported natively you will notice the hurt of the increased bandwidth. Note that the bandwidth is only a problem for actual inputs/outputs of the program... the intermediate values should be in GPU registers... but then again... double registers take twice as much space as single registers, so with the same space you get half of them, supporting less complex shaders. I'd say that doubles will at least mean half the performance even if they support them natively.

I don't know how Fermi GPUs handle IEEE. Usually GPUs have a really relaxed compliance as their target application doesn't require it. I am talking about how denorms, overflows, underflows etc are treated. Comparing single FLOPS to double FLOPS is already quite unfair... ...if we are also comparing a relaxed IEEE compliance on the handling of infinities, denorms and the like with a fully compliant one it becomes even worse (and I repeat, I don't know the actual compliance of any of the two compared machines).

Re:Can someone explain... (1)

PitaBred (632671) | more than 4 years ago | (#30203396)

Most of them are somewhat relaxed IEEE on doubles anyway. They don't do the full 80bit for long doubles, they typically only do the 64bit double. There are times where having those 80bit calculations are important, especially when you start running into huge data sets.

Re:Can someone explain... (0)

Anonymous Coward | more than 4 years ago | (#30204102)

Hardware level double precision *anything* is a net win over faking it in single precision with software. Consider the simplest possible case: adding two 32 bit ints vs two 64 bit ints:

32 bit: first 64 bit int simulated by two 32 bit ints (called them B A) and second by another two (call them D C). So this is four loads into registers, then A + C stored in C, check overflow, if so add 1 to B or D, then add B + D store in D, then store C and D. Worst case, 4 loads, 3 adds with a branch-on-overflow, 2 stores. Best case, due to caching and/or parallelism, it's only 1 load, 2 adds, 1 store.

64 bit: load two 64 bit ints, add them, store result. Worst case, two loads, one add, two stores. Best case, 1 load, 1 add, 1 store.

Note that you're moving the exact same amount of data across the bus in both cases: 128 bits to the CPU or GPU, 64 bits back to memory. And you're taking the same amount of space in registers (128 bits). But in the native 64-bit hardware, you're doing as few as 1/3 the operations. And the instructions are longer in the 32-bit-fakery version due to the branch.

Faking higher-precision floating point in lower-precision hardware is WORSE. See the standard here: http://en.wikipedia.org/wiki/Double_precision_floating-point_format [wikipedia.org] . 1 bit sign, 11 bit exponent, 52 bit fraction. To do it the hard way using ints, you'd need to use two 64 bit registers or three 32 bit registers. If your numbers are coming/going from main memory packed in the actual 64 bit FP standard, you also need to be unpacking and repacking them, which means more operations with some AND bit masking and shifting, which means more registers used. Ugly stuff.

Re:Can someone explain... (1)

petermgreen (876956) | more than 4 years ago | (#30205104)

check overflow, if so add 1 to B or D, then add B + D store in D
Most CPUs have an "add with carry" instruction that reduces this sequence of steps to one instruction.

Faking higher-precision floating point in lower-precision hardware is WORSE.
Agreed, FAR worse.

Re:Can someone explain... (3, Informative)

TheKidWho (705796) | more than 4 years ago | (#30201140)

The next gen Fermi is supposed to do ~600 Double Precision GFLOPS. It also has ECC Memory, has a threading unit built into it, and a lot more cache.

http://en.wikipedia.org/wiki/GeForce_300_Series [wikipedia.org]

Re:Can someone explain... (1)

freak132 (812674) | more than 4 years ago | (#30202830)

Many GPUs do in fact support double precision, its not IEEE standard double precision floating point yet, but that's going to be a feature of the next generation or two. My source is ATi, anything marked with a superscript of '1' does not support double precision maths, everything else does. ATi StreamSDK requirements [amd.com]

I thought it was Single Instruction Multiple Data (1)

Colin Smith (2679) | more than 4 years ago | (#30200774)

GPUs are massively parallel handling hundreds of cores and tens of thousands of threads

eh? Massively parallel yes. The rest?

More to do with a single instruction performing the same operation on multiple bits of data at the same time. AKA vector processors. Great for physics/graphics processing where you want to perform the same process on lots of bits of data.

 

Re:I thought it was Single Instruction Multiple Da (1)

blueg3 (192743) | more than 4 years ago | (#30201840)

Sort of. NVIDIA's definition of a "thread" is different from a CPU thread -- it's more similar to the instructions executed on a single piece of data in a SIMD system. You're not required to make data-parallel code for the GPU, but certainly data-parallel code is the easiest to write and visualize.

On NVIDIA chips, at least, there are a number of independent processors. The processors execute vector instructions (though all the vector instructions can be conditionally executed, so that, e.g., they only affect some of the data). Optimally, they have many sets of instruction flows at the same time -- they have a built-in zero-cost thread context switch, and computation in one set of threads is used to hide memory access time for the other threads.

Re:Can someone explain... (1)

EdZ (755139) | more than 4 years ago | (#30200374)

Benefits: blindingly fast when running massively parallel computations (think several hundred thousand threads).
Drawbacks: trying to program something to take advantage of all that power requires you to scale up to several thousand threads. Not always that easy.

Re:Can someone explain... (0)

Anonymous Coward | more than 4 years ago | (#30200376)

Heavy mathematics - anything that is readily parallelised. GPUs are basically massive vector arithmetic units, capable of crunching through lots and lots of floating point calculations at very high speed. As usual, Wikipedia [wikipedia.org] has an informative article.

Re:Can someone explain... (1)

Sockatume (732728) | more than 4 years ago | (#30200382)

GPUs are fast but limited to very specific kinds of instructions. If you can write your code using those instructions, it will run much quicker than it would on a general-purpose processor. They're also ahead of the curve on things like parallelisation, compared to desktop chips: the idea of writing graphics code for a 12-pipe GPU was mundane half a decade ago while there's still scant support for multiple cores in CPUs.

Re:Can someone explain... (3, Interesting)

Anonymous Coward | more than 4 years ago | (#30200420)

I can take a stab; GPUs traditionally render graphics, good at processing vectors and mathsy things. Now think of a simulation of a bunch of atoms, the forces between the atoms are often approximated to Newtonian laws of motion for computational efficiency reasons, this is especially important when dealing with tens of thousands of atoms - called Molecular Dynamics (MD). So the same maths used for graphic intensive computer games is the same as classical MD. The problem hither to is that MD software has never really been compiled for GPU architecture, just Athlons and Pentiums.

I should mention that I use the CSIRO CPU cluster, it's quite good already, but I'm still waiting weeks to simulate a microsecond of 10,000 atoms using 32 processors. My new side project will be trying it out on the GPUs. 100x faster they reckon, that'll be a game changer for me

Re:Can someone explain... (1)

Sockatume (732728) | more than 4 years ago | (#30200650)

On that subject, I just read a great paper on methane hydrates (trapping methane in ice) which would've have been possible without some truly enormous computing horsepower. Studies over a microsecond timescale (which is an eternity for molecules in motion) were needed because of the rarity of the events they were trying to model. Good luck: you're opening up a whole new generation of computational chemistry.

GPUs are good if (4, Informative)

Sycraft-fu (314770) | more than 4 years ago | (#30200498)

1) Your problem is one that is more or less infinitely parallel in nature. Their method of operation is a whole bunch of parallel pathways, as such your problem needs to be one that can be broken down in to very small parts that can execute in parallel. A single GPU these days can have hundreds of parallel shaders (the GTX 285 has 240 for example).

2) Your problem needs to be fairly linear, not a whole lot of branching. Modern GPUs can handle branching, but they take a heavy penalty doing it. They are designed for processing data streams where you just crunch numbers, not a lot of if-then kind of logic. So if your problem should be fairly linear to run well.

3) Your problem needs to be solvable using single precision floating point math. This is changing, new GPUs are getting double precision capability and better integer handling, but almost all of the ones on the market now are only fast with 32-bit FP. So your problem needs to use that kind of math.

4) Your problem needs to be able to be broken down in to pieces that can fit in the memory on a GPU board. This varies, it is typically 512MB-1GB for consumer boards and as much as 4GB for Teslas. Regardless, your problem needs to fit in there for the most part. The memory on a GPU is very fast, 100GB/sec or more of bandwidth for high end ones. The communication back to the system via PCIe is an order of magnitude slower usually. So while you certainly can move data to main memory and to disk, it needs to be done sparingly. For the most part, you need to be cranking on stuff that is in the GPU's memory.

Now, the more your problem meets those criteria, the better a candidate it is for acceleration by GPUs. If your problem is fairly small, very parallel, very linear and all single precision, well you will see absolutely massive gains over a CPU. It can be 100x or so. These are indeed the kind of gains you see in computer graphics, which is not surprising given that's what GPUs are made for. If your problem is very single threaded, has tons of branching, requires hundreds of gigs of data and such, well then you might find offloading to a GPU slower than trying it on a CPU. The system might spend more time just getting the data moved around than doing any real work.

The good news is, there's an awful lot of problems that nicely meet the criteria for running on GPUs. They may not be perfectly ideal, but they still run plenty fast. After all, if a GPU is ideally 100x a CPU, and your code can only use it to 10% efficiency, well hell you are still doing 10x what you did on a CPU.

So what kind of things are like this? Well graphics would be the most obvious one. That's where the design comes from. You do math on lots of matrices of 32-bit numbers. This doesn't just apply to consumer game graphics though, material shaders in professional 3D programs work the same way. Indeed, you'll find those can be accelerated with GPUs. Audio is another area that is a real good candidate. Most audio processing is the same kind of thing. You have large streams of numbers representing amplitude samples. You need to do various simple math functions on them to add reverb or compress the dynamics or whatever. I don't know of any audio processing that uses GPUs, but they'd do well for it. Protein folding is another great candidate. Folding@Home runs WAY faster on GPUs than CPUs.

At this point, GPGPU stuff is still really in its infancy. We should start to see more and more of it as more people these days have GPUs that are useful for GPGPU apps (pretty much DX10 or better hardware, nVidia 8000 or higher and ATi 3000 or higher). Also there is starting to be better APIs out for it. nVidia's CUDA is popular, but proprietary to their cards. MS has introduced GPGPU support in DirectX, and OpenCL has come out and is being supported. As such, you should see more apps slowly start to be developed.

GPUs certainly aren't good at everything, I mean if they were, well then we'd just make CPUs like GPUs and call it good. However there is a large set of problems they are better than the CPU at solving.

Re:GPUs are good if (2, Insightful)

MichaelSmith (789609) | more than 4 years ago | (#30200512)

TFA doesn't talk about specific applications but I bet the CSIRO want this machine for modelling. Climate modelling is a big deal here in Australia. Predicting where the water will and will not be. This time of year bush fires are a major threat. I bet that with the right model and the right data you could predict the risk of fire at high resolution and in real time.

Re:GPUs are good if (1)

machine321 (458769) | more than 4 years ago | (#30200694)

No, this is to hide the space ships from public view. You're on Slashdot, so youi've obviously watched Stargate. We have these giant space-faring ships, but we can't let the public know about them. They're obviously going to cover the sky with giant LCD monitors, so amateur astronomers can't see what's going on. Let's just hope they remember to scrape off the logo.

What? That's not what they meant by "launch"? Oh.

Re: TFA doesn't talk about specific applications (1)

neonsignal (890658) | more than 4 years ago | (#30200906)

modelling how to split the beer atom?

Re: TFA doesn't talk about specific applications (1)

MichaelSmith (789609) | more than 4 years ago | (#30206122)

All you need for that is a chisel and a back shed to work in.

Re:GPUs are good if (1)

afidel (530433) | more than 4 years ago | (#30202642)

The problem of resolution is normally one of data, not modeling power. The reason forecast's aren't much good past 7-10 days is that the points between data collection stations leads to too much future randomness that no amount of additional processing power will eliminate. There ARE other fields that can take advantage of every bit of processing power you can find, molecular chemistry, proteomics, among others.

Re:GPUs are good if (0)

Anonymous Coward | more than 4 years ago | (#30203268)

They mean long term forecasting, not typical forecasts. We're suffering from a ten year drought that has no end in sight and a massive hole in our ozone layer. Our hydrology departments that are analyzing catchment areas could use this as could many others.

Climate analysis is the logical motivation, it's about the only scientific reason Australia needs massively parallel systems.

Re:GPUs are good if (1)

Rockoon (1252108) | more than 4 years ago | (#30200882)

You touched on but I think you missed the #1 biggest winner for high end GPU's as it pertains to most GPGPU stuff.. convolution.

It is not an exaggeration to call these things super-convolvers, excelling at doing large-scale pairwise multiply-and-add's on arrays of floats, which can be leveraged to do more specific things like large matrix multiplication in what amounts to (in practice) sub-linear time. A great many different problem sets can be expressed as a series of convolutions, including neural networks, navier stokes, fourier, signal filtering, cross-correlation, and so on and on..

Re:GPUs are good if (1)

Odinlake (1057938) | more than 4 years ago | (#30201434)

matrix multiplication in what amounts to (in practice) sub-linear time.

What? The GPU matrix multiplications are generally done in the straightforward O(n^3) fashion - you may divide this by something proportional to the number of cores available, but what you mean by "sub-linear" I can't imagine.

twice the size but what cost? (0, Interesting)

Anonymous Coward | more than 4 years ago | (#30200338)

The article didn't seem to mention cost, power usage, heat, or anything remotely relevant. Just a nice happy fluff piece for NVIDIA who I do adore but really these articles on slashdot do not have as much tech sustenance as it used to.

Re:twice the size but what cost? (1)

Sockatume (732728) | more than 4 years ago | (#30200388)

The system is as fast as setups twice the size, i.e. it is half the size.

Re:twice the size but what cost? (1, Informative)

Spazed (1013981) | more than 4 years ago | (#30200476)

It is probably considerably cheaper as well, GPU based 'supercomputers'(Can we please stop calling them that? Can we just say, "Computers we'll all carry around in ten years?") aren't a whole lot more than SLI that gamers have been using for decades, the parts are pretty cheap. The Tesla website claims it is 1/100th the cost of a traditional supercomputer, which might only really become true over the lifetime of the machine because of the lower power requirements. Of course that is also assuming that your problem is a good fit for a Tesla system.

Re:twice the size but what cost? (1)

prefect42 (141309) | more than 4 years ago | (#30200538)

This is also true of any typical x86_64 node in an HPC. It's just a regular server board, often optimised for high density (so half-width 1U isn't uncommon) and with a better interconnect than gigabit (like Infiniband). Rack mount that'll fit double width PCIe cards used to be tricky. Now even Dell produces one (Precision rack mount).

Having code that it is a good fit for Tesla is a big problem. A lot of HPC work is code that's been tweaked since the 70s that is a mangle of Fortran 4/77/90/95 hacked on by every PhD along the way. Rewriting it would often reap huge rewards (never mind looking at GPGPU), but getting the funding and time to do it is another matter.

Re:twice the size but what cost? (2, Insightful)

Anonymous Coward | more than 4 years ago | (#30200710)

Coding will never get you publications, so the necessary rewrites are never done. We have code that refuses to compile on compilers newer than ~2003, and even needs specific library versions. It will never be fixed.
To all fellow PhD's out there hacking away at programs: The best thing you can do is to rely heavily on standard libraries (BLAS and LAPACK). The CS guys do get (some) publications out of optimizing those, so there are some impressive speedups to be had there - like recently the FLAME project for LAPACK, e.g. or the GOTO-BLAS for the BLAS routines.

Cool but... (1, Funny)

Anonymous Coward | more than 4 years ago | (#30200342)

..can it Run CRySiS?

Seems logical to me. (1)

JorDan Clock (664877) | more than 4 years ago | (#30200434)

A super computing cluster is already used for highly parallelized problems. Using hardware that handles those kinds of problems at a far greater speed than a typical CPU is a no-brainer. I think the part of the story that would be real interesting to the /. crowd is what exactly are the kinds of problems they're using this cluster to speed up. GPUs aren't too keen on problems involving data that is hard to cache and as far as I know, the instruction set is somewhat limited to doing lots of little, parallel calculations, but have a hard time with large, solid problems.

I am very interested in seeing what kinds of research this will help the most with and what areas will still be more efficient to run on Xeons/Opterons.

Oh look.... (0, Redundant)

Ozlanthos (1172125) | more than 4 years ago | (#30200436)

My next year's desktop specs!

-Oz

The World of Tomorrow (1, Funny)

muphin (842524) | more than 4 years ago | (#30200478)

wow the world of technology is spiking, i remember only a few years ago there was only 1 massive super computer, now every university will have one, what next, link every supercomputer and have a supercomputer cloud or should i say nebula now? :p the rise of the machine, let me take this time to welcome our new ovelords.

Re:The World of Tomorrow (1)

u38cg (607297) | more than 4 years ago | (#30204576)

They should link them all together to form one supercomputer, it would need some kind of hardcore name, though, like something out of an Anglo-Saxon epic perhaps.

In related news... (2, Interesting)

sonamchauhan (587356) | more than 4 years ago | (#30200552)

Hmmm.... is this setup a realisation of this release from Nvidia in March

Nvidia Touts New GPU Supercomputer
http://gigaom.com/2009/05/04/nvidia-touts-new-gpu-supercomputer/ [gigaom.com]

Another 'standalone' GPGPU supercomputer, without the Infiniband switch
University of Antwerp makes 4000EUR NVIDIA supercomputer
http://www.dvhardware.net/article27538.html [dvhardware.net]

FINALY! (2, Funny)

TheDarkMaster (1292526) | more than 4 years ago | (#30200612)

Finaly a machine good enought to run Crysis at full specs on 1680x1050 (well, I hope so)

Re:FINALY! (1)

ozbird (127571) | more than 4 years ago | (#30200982)

Yes, but can it run Duke Nukem Forever?

Re:FINALY! (0)

Anonymous Coward | more than 4 years ago | (#30201930)

Eventually.

Re:FINALY! (1)

TheKidWho (705796) | more than 4 years ago | (#30202766)

My current computer already runs Crysis at full specs at 1680x1050 you insensitive clod!

Re:FINALY! (1)

BikeHelmet (1437881) | more than 4 years ago | (#30206966)

Mine runs it at 2048x1152.

I just have trouble controlling my character at 7fps.

Re:FINALY! (1)

Barny (103770) | more than 4 years ago | (#30208896)

Time to upgrade, my 12mth old hardware runs it at 1920x1200 with all detail on max at 60fps.

This meme is getting old very fast.

Re:FINALY! (1)

BikeHelmet (1437881) | more than 4 years ago | (#30209188)

I did just upgrade... my monitor! :P

Are you referring to the insensitive clod meme? Yeah, it's not funny anymore.

Re:FINALY! (1)

TheKidWho (705796) | more than 4 years ago | (#30209840)

Your 12 month hardware is probably a GTX285/275 SLI setup or a similar ATI one. Most users don't have such luxury :P

Re:FINALY! (1)

Barny (103770) | more than 4 years ago | (#30212022)

Very very close, GTX280 SLI :)

Most users also don't need to run at 1920x1200, and most users can now afford a GTX275 for their basic gaming machine without too much of a stretch :)

Re:FINALY! (1)

TheDarkMaster (1292526) | more than 4 years ago | (#30214040)

This config on Brazil is a little... difficult. You pay maybe $400 on one GTX280, here you are forced to pay +- $755 for the exacty same card. For a SLI setup the cost will go to $1510... only the cards.

Re:FINALY! (1)

Barny (103770) | more than 4 years ago | (#30222900)

Likely because 280s were phased out a while back by 285s, which should be cheaper to get a hold of.

Re:FINALY! (1)

TheDarkMaster (1292526) | more than 4 years ago | (#30213934)

But... but i do not have the necessary north-korean nuclear reactor to power this... thing :)

NVIDIA, huh? (1, Funny)

Anonymous Coward | more than 4 years ago | (#30200640)

Does it use wood screws?

not first, just big (1)

mattdm (1931) | more than 4 years ago | (#30200796)

We [harvard.edu] have one of those already; I imagine a lot of schools do. Ours is only an 18-node cluster so the numbers are much smaller, but the story here is that this is relatively big, not that it's some new thing.

Re:not first, just big (1)

Odinlake (1057938) | more than 4 years ago | (#30201484)

Tsubame at Tokyo Tech's also had GPU's for well over a year now, and though I'm not sure about the numbers we talk large scale (high on the top 500 list).

Re:not first, just big (1)

dlapine (131282) | more than 4 years ago | (#30202722)

We've had the Lincoln cluster [illinois.edu] online and offering processing time since February of 2009. 196 computing nodes (dual quad cores) and 96 Tesla units. That being said, congrats to the Aussie's for bringing a powerful new system online.

Someone later in thread asked if these GPU units would actually be useful for scientific computing. We think so. Our users and researchers here have developed implementations of both NAMD, a parallel molecular dynamics simulator [uiuc.edu] and MIMD Lattice Computation (MILC) Collaboration [indiana.edu] that use the power of the GPU's. Both of these codes are freely available and widely used in the HPC community. We've had no lack of requests for time on the Lincoln cluster.

Are these GPUS for everyone? Nope. To disappoint all you gamers out there, the Tesla units have no graphics out ports. All the communication is done over the the PCIe bus. But for all of you budding scientists out there, these cards use the same freely available CUDA language that runs on all modern (8xxx and above) Nvidia hardware, so you may already have compatible GPU in your desktop now, even if it's just a single unit and slower.

One last note, while these units run really fast with single precision, they are capable of running in double precision, albeit much slower. For some problems, multiple initial runs can be done at the lower precision to localize the solution set, before doing a slower high precision run to find the final solution. This is similar to what Hollywood does when rendering animated movies- they first render a quick lo res version to see if the timing and characters are correct, then they run a hi-res version which takes longer to get a finished product. (Yes, I know, there's a lot more steps to it, but hey, this is just an analogy)

Re:not first, just big (1)

Barny (103770) | more than 4 years ago | (#30208932)

Mod parent up, although limit the mod to around 4, there were no cars in his analogy!

I can see the CSIRO getting more cool toys in the not-too-distant future what with their payout from their 802.11n patent win. Great to see them putting the funds to use (although I am betting this baby was on order long before that money will flow into their coffers).

mistake for open source (1)

GNUPublicLicense (1242094) | more than 4 years ago | (#30200806)

From an open source point of view... this is a mistake since we (as open source people) must favor AMD GPUs. Moreover, it has been 2 years the AMD GPUs seem faster than nvidia ones. So from such bad news, open source people must keep the bearing: favor AMD GPUs whatever.

what to do when the 'tires' are all flat (0)

Anonymous Coward | more than 4 years ago | (#30200846)

give it more gas, that's what we do? when we run out of/blow up from, gas??? ta da?

some of us are already learning to walk again. it feels pretty good.

CUDA, GPGPU, OpenCL etc. (1)

EdgeyEdgey (1172665) | more than 4 years ago | (#30201028)

What API would be the best approach for writing some future proof GPU code?
I'm willing to sacrifice some bleeding edge performance now for ease of maintainability.

Other GPU possibilities
* OpenCL
* GPGPU
* CUDA
* DirectCompute
* FireStream
* Larrabee
* Close to Metal
* BrookGPU
* Lib Sh

Cheers

Re:CUDA, GPGPU, OpenCL etc. (0)

Anonymous Coward | more than 4 years ago | (#30202126)

OpenCL has the best papers IMHO. It is vendor neutral, works on MacOSX, Windows and Linux (nVidia). It is supported by both nVidia and AMD/ATI.

Imagine... (1)

cyborch (524661) | more than 4 years ago | (#30201098)

... a beowulf cluster of those! ;)

(Sorry, it had to be said)

Floating point operations (0)

Anonymous Coward | more than 4 years ago | (#30201422)

The biggest benefit to GPU processing is that they are much more adept at floating-point math...that is, 2.5436*23.561234 instead of 1829*2304. The distributed computing efforts (Folding, Boinc projects) have started writing clients for users' gpus as well, and have seen great success so far.

Floating point operations are tedious on normal processors, but the shader units on GPU's, designed to handle complex calculations for graphical effects, process non-integers much faster.

The #5 Supercomputer is already GPU based (1)

thatguymike (598339) | more than 4 years ago | (#30202252)

http://www.top500.org/system/10186 [top500.org] The machine quoted in TFA is quoting single precision. Currently the ATI boards trounce the Nvidia boards in double precision. The next GPU cluster down the list is Nvidia based at #56 http://www.top500.org/site/690 [top500.org]

frIs-t stop (-1, Offtopic)

Anonymous Coward | more than 4 years ago | (#30202408)

to be a3o0t doing is mired in an

article is incorrect (0)

Anonymous Coward | more than 4 years ago | (#30203354)

the cluster actually has 50 Tesla C1070 boards, each of which contains 4 GPUs
so its 200 GPUs, and that is just the initial rollout with additional nodes to be delivered pretty quickly (perhaps waiting for Fermi)

I seem to recall ... (1)

PPH (736903) | more than 4 years ago | (#30205240)

... reading a story some time ago about the use of GPU clusters by organizations on national security watch lists to circumvent ITAR controls.
Check for New Comments
Slashdot Login

Need an Account?

Forgot your password?