Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Grand Unified Theory of SIMD

Hemos posted more than 9 years ago | from the the-string-theory-of-SIMD dept.

Software 223

Glen Low writes " All of a sudden, there's going to be an Altivec unit in every pot: the Mac Mini, the Cell processor, the Xbox2. Yet programming for the PowerPC Altivec and Intel MMX/SSE SIMD (single instruction multiple data) units remains the black art of assembly language magicians. The macstl project tries to unify the architectures in a simple C++ template library. It just reached its 0.2 milestone and claims a 3.6x to 16.2x speed-up over hand-coded scalar loops. And of course it's all OSI-approved RPL goodness. "

Sorry! There are no comments related to the filter you selected.

Altivec (5, Informative)

BWJones (18351) | more than 9 years ago | (#11597314)


For those who want a little background on Altivec, of course Wiki has a description here [wikipedia.org] . Apple, who now ships Altivec in every system they make has a pretty good page here [apple.com] and Motorola nee Freescale has one here [freescale.com] .

The benefits of Altivec can be truly astounding for those processes that can be "vectorized". After all putting these kinds of calculations in hardware has got it all over software computation. It kind of reminds me of when I got one of those Photoshop accelerator hardware cards (Radius Photoengine with 4 DSPs on a daughter card linked to the Thunder series video card) for my IIci. Photoshop filter functions ran faster on that IIci than they did on much later PowerPC systems simply because you now had four hardware DSPs running your image math.

Re:Altivec (-1, Offtopic)

Anonymous Coward | more than 9 years ago | (#11597354)

fear

Re:Altivec (-1, Troll)

Anonymous Coward | more than 9 years ago | (#11597565)

i dont get it.

how does this affect me as a proffesional visual basic computer security program develloper?

i work in delhi for an american bank to improve their account-management with newest technology.

Re:Altivec (-1, Troll)

rebeka thomas (673264) | more than 9 years ago | (#11597388)

> Apple, who now ships Altivec in every system they make

The Mac Mini has no altivec unit.

Re:Altivec (0)

Anonymous Coward | more than 9 years ago | (#11597414)

The altivec isn't simply a unit on the G4?

Re:Altivec (1)

wulfhound (614369) | more than 9 years ago | (#11597419)

Yes it does.. it's a G4, all G4s have Altivec.

Re:Altivec (3, Informative)

mod_critical (699118) | more than 9 years ago | (#11597442)

Altivec == Velocity Engine

And is part of every G4

Re:Altivec (-1)

Anonymous Coward | more than 9 years ago | (#11597638)

What Mr. Summers said about women and science (and technology) is obviously true in your case.

Re:Altivec (0)

Anonymous Coward | more than 9 years ago | (#11597921)

Wrong. Go back to the kitchen where you belong. This 'computer' thing isn't for you.

Re:Altivec (4, Informative)

shawnce (146129) | more than 9 years ago | (#11597413)

Just pick a few items out ...

Apple provides source code for some of their vector libraries [apple.com]

Re:Altivec (2, Interesting)

baryon351 (626717) | more than 9 years ago | (#11597463)

It kind of reminds me of when I got one of those Photoshop accelerator hardware cards (Radius Photoengine with 4 DSPs on a daughter card linked to the Thunder series video card) for my IIci. Photoshop filter functions ran faster on that IIci than they did on much later PowerPC systems simply because you now had four hardware DSPs running your image math.


I managed to pick up a ThunderIV last year with the DSP card, and had a run around with photoshop on it. It's impressive stuff. I have an iMac 350 here I also ran photoshop on, and while the 350 kicked the Thunder in a Quadra for many unaccelerated things, on those operations where the DSPs kicked in (and the card has those cool little LEDs to show just when it's happening) it could keep up with the iMac nearly neck & neck.

That's a 25MHz 68040 from 1992 and Thunder IVGX vs a 350MHz G3 from 2000. Very cool.

Other way around (1)

Kiryat Malachi (177258) | more than 9 years ago | (#11597653)

Freescale, nee Motorola. (Nee roughly translates to "formerly known as").

Re:Other way around (1, Informative)

Anonymous Coward | more than 9 years ago | (#11597710)

Or born, like the french word it is: née.

No need for anyone to whip out the online dictionary and tell me "formerly known as" is an acceptable alternative.

Re:Other way around (0)

Anonymous Coward | more than 9 years ago | (#11597756)

... or 'maiden name' as it aplies to married women a lot

More AltiVec Goodness (4, Informative)

LordRPI (583454) | more than 9 years ago | (#11597342)

Apple has had AltiVec optimized libraries for DSP and such since the early releases of OS X.

Re:More AltiVec Goodness (1)

goMac2500 (741295) | more than 9 years ago | (#11597548)

How is parent flamebait? It's a fact, and its not flamebait considering Apple is one of the only companies currently shipping Altivec systems.

Reads the article again... (0, Offtopic)

Wandering Wombat (531833) | more than 9 years ago | (#11597353)

*nods* Yes.

Umm (2, Informative)

TheKidWho (705796) | more than 9 years ago | (#11597358)

Doesn't XCode have a feature that lets you "vectorize" certain parts of your code already?

Re:Umm (2, Informative)

Richard_at_work (517087) | more than 9 years ago | (#11597846)

The next version of Xcode will support autovectorisation, but I dont think it does it atm.

A little background (4, Informative)

xXunderdogXx (315464) | more than 9 years ago | (#11597359)

From the Wikipedia article on SIMD:
An example of an application that can take advantage of SIMD is one where the same value is being added to a large number of data points, a common operation in many multimedia applications. One example would be changing the brightness of an image. Each pixel of an image consists of three 8-bit values for the brightness of the red, green and blue portions of the color. To change the brightness, the R G and B values are read from memory, a value is added (or subtracted) from it, and the resulting value is written back out to memory.


With a SIMD processor there are two improvements to this process. For one the data is understood to be in blocks, and a number of values can be loaded all at once. Instead of a series of instructions saying "get this pixel, now get this pixel", a SIMD processor will have a single instruction that effectively says "get all of these pixels" ("all" is a number that varies from design to design). For a variety of reasons, this can take much less time than it would to load each one by one as in a traditional CPU design.
But of course I'm sure everyone here knew that..

Re:A little background (1)

Bisqwit (180954) | more than 9 years ago | (#11597696)

An example of an application that can take advantage of SIMD is one where the same value is being added to a large number of data points, a common operation in many multimedia applications.

How is this different for MMX?
Because I thought MMX does exactly what you described.

Re:A little background (1)

xXunderdogXx (315464) | more than 9 years ago | (#11597765)

If I'm not mistaken, wouldn't MMX be an implementation of SIMD?

Re:A little background (0)

Anonymous Coward | more than 9 years ago | (#11597937)

Yeah, you're right. MMX is a SIMD extension.

Re:A little background (3, Informative)

DLWormwood (154934) | more than 9 years ago | (#11597896)

How is this different for MMX?

Based on personal recollections reenforced by a quick Wiki'ing, MMX's problem wasn't the concept itself, but Intel's braindead constraints placed on x86 support for vectors. MMX recycled the same registers as used for floating point math, causing expensive context switches between each mode and only allowing integer math to be vectorized. Intel eventually developed SSE to work around some of the bottlenecks, but the eventual dominance of GPUs on the PC platform reduced the development priority for vector math in the CPU.

Re:A little background (1)

Gr8Apes (679165) | more than 9 years ago | (#11597739)

Evidently not [slashdot.org] ;)

16X increase? (1)

Sensible Clod (771142) | more than 9 years ago | (#11597366)

Okay, I'm willing to believe it, but only if someone shows how that's possible.

Re:16X increase? (2, Interesting)

mirko (198274) | more than 9 years ago | (#11597412)

When using Reason 3 [propelerheads.se] , some virtual synths have the option to produce an enhanced sound.
What is curious is that if you are using a pre-Altivec proc (G3), it'll burn more CPU time while the same enhancement will be totally and natively supported by Altivec-enabled units : a 400MHz G4 Powerbook is enhancing these sytnhs more efficiently than an 800MHz G3.
I guess this was like the simultaneous operations that the ARM assembly language supports (e.g. both storing and rotating values in an operation)...

oops (1, Informative)

Anonymous Coward | more than 9 years ago | (#11597446)

Typo...
Propellerheads.SE [propellerheads.se]

Re:16X increase? (0)

Anonymous Coward | more than 9 years ago | (#11597444)

The SSE-registers in x86 processors are 128bit long.
You can load two register with 16 8bit values each, and add the two registers in one operation.
In theory this gives a 16x increase, but there is additioal overhead to bee considered.

Re:16X increase? (5, Informative)

LordRPI (583454) | more than 9 years ago | (#11597457)

The principle behind SIMD, or, rather, Single Instruction Multiple Data, is that you can process wide arrays of values in a single instruction. With the PowerPC version of SIMD, also known as AltiVec, you can issue an instruction and have it work with a 128-bit wide register. These registers may contain up to 4 32-bit numbers, 8 16-bit numbers or 16 8-bit numbers. For example, I can load two AltiVec registers with 16 unsigned chars, add them together using Vec_Add() and have it return its results to an AltiVec register. So this in essense is adding 16 values at once and in theory it's good enough for markeing to claim a 16X speedup, but this is rarely the case.

Re:16X increase? (0)

Anonymous Coward | more than 9 years ago | (#11597855)

Yea, but how does this compiler/library/whatever add a 3.6X-16X speedup over handcoded simd code?

fear (-1, Flamebait)

Anonymous Coward | more than 9 years ago | (#11597376)

hawfear

Long thread about using Altivec (4, Informative)

ThousandStars (556222) | more than 9 years ago | (#11597380)

The Mac forum at Ars Technica has a long, continuing post [arstechnica.com] about Altivec optimizations and how they should be used. The thread started more than two years ago and still gets relevent points and questions added to it. It's an amazing resource if you're interested in starting.

Read the Altivec mailing list (4, Informative)

kuwan (443684) | more than 9 years ago | (#11597703)

A better resource for Altivec and SIMD in general is the SIMDtech.org [simdtech.org] website and Altivec [simdtech.org] mailing list. There are tutorials and technical manuals available and the email list is indispensable. While the mailing list is mostly geared towards Altivec optimizations and discussions all SIMD discussion is welcome, including MMX/SSE. There are Apple engineers that read and contribute to the list as well as Motorola/Freescale engineers. It's probably the single best resource available to Altivec programmers and you get to talk directly to the Wizards that created it.

I'm a relative newcomer to the list and it's been an invaluable resource as I've optimized with Altivec.

--
Join the Pyramid - Free Mini Mac [freeminimacs.com]

Re:Long thread about using Altivec (0)

Anonymous Coward | more than 9 years ago | (#11597752)

As useful as Altivec can be for some people, it doesn't support "double"s, as in double precision floating point values.

So what? (0, Funny)

Anonymous Coward | more than 9 years ago | (#11597389)

Big freaking deal. Hardware and software are irrelevant. It's all about content now.

mod parent up (-1, Offtopic)

Anonymous Coward | more than 9 years ago | (#11597437)

Content is king. For example, the contents of your mother's anus include my dick.

Re:mod parent up (-1, Offtopic)

Gilmoure (18428) | more than 9 years ago | (#11597685)

Cavett, Sargent, or York?

Re:So what? (0)

Anonymous Coward | more than 9 years ago | (#11597541)

Let's see how you deliver that content without hardware or software.

Moore's Law has eroded the need for assembly (1, Interesting)

betelgeuse68 (230611) | more than 9 years ago | (#11597396)

Moore's Law has eroded the need for such knowledge. It would be like concerning myself on how to design circuits to convert a DC current to AC current because I happen to use devices that use electricity, e.g., my toaster (as in bread).

I learned assembly long ago, still retaining a fair amount of it (80x86). There have been a few occasions where I've called upon its use, yeah twice in the last eight years... and that's about it.

Yes some people who write games are still concerne with assembly as are people in embedded markets. But those jobs, situations and skills are niche, much like the Win32 programming I used to do in the early 90's.

90% of IT jobs are with non-tech companies. Those situations are about the last place you will find anyone caring about something called "assembly language."

-M

Moore's Law has nothing to do with assembly (2, Insightful)

Anonymous Coward | more than 9 years ago | (#11597448)

Moore's Law has eroded the need for assembly

Moore's Law has nothing to do with assembly language and optimizations. From Wikipedia [wikipedia.org] :

Moore's law is an empirical observation stating, in effect, that at our rate of technological development and advances in the semiconductor industry, the complexity of integrated circuits doubles every 18 months.

I wish people would stop saying "But Moore's Law..." for every hardware-related story on Slashdot. Do a bit of reading, please.

Re:Moore's Law has eroded the need for assembly (2, Funny)

geoffspear (692508) | more than 9 years ago | (#11597513)

99% of all jobs in the world require no programming at all. Therefore, there is no need for anyone anywhere to learn C.

90% of the worlds' people do not own cars. Therefore, there is no need for gas stations. If you pick a living human completely at random from the earth, chances are they don't drive one of these "car" things.

Re:Moore's Law has eroded the need for assembly (1)

bonch (38532) | more than 9 years ago | (#11597602)

Yes some people who write games are still concerne with assembly as are people in embedded markets. But those jobs, situations and skills are niche, much like the Win32 programming I used to do in the early 90's.

I don't consider Doom 3 to be a niche.

Re:Moore's Law has eroded the need for assembly (1)

betelgeuse68 (230611) | more than 9 years ago | (#11597807)

Sure, and you and everyone you know is working on Doom3 or a competitor?

Just because you use it, doesn't mean you engineer it.

You use a TV... when was the last time you even thought of any of the eletronics inside of it?

-M

Assembly (2, Insightful)

bsd4me (759597) | more than 9 years ago | (#11597605)

Even in embedded systems, assembly isn't used as much as it used to. It still get used in bootloaders, and sometimes in device drivers. However, most devices are memory mapped, and most of the driver is written in C, and asm() calls are made when appropriate (eg, asm("eieio");), especially when you get to use gcc and asm() syntax for accessing variables.

Assembly-DSPs (0)

Anonymous Coward | more than 9 years ago | (#11597720)

"Even in embedded systems, assembly isn't used as much as it used to. "

It is when programming DSP's (and related devices). Don't forget that microcontrollers outnumber microprocessors by a large margin. And preformance is important there (especially automotive and aeronautics.)

Re:Moore's Law has eroded the need for assembly (0)

Anonymous Coward | more than 9 years ago | (#11597659)

People who write compilers, JIT interpreters and emulators still need assembly. Very popular open source projects which the writers needed knowledge of assembly include GCC, Mono and Bochs.

Re:Moore's Law has eroded the need for assembly (2, Insightful)

lowe0 (136140) | more than 9 years ago | (#11597661)

Which is exactly why this sort of thing is so important.

Sure, you could probably get it to work even faster with hand-tuned assembly than simply using this library. But programmer time is expensive, and customizing code adds complexity. By reusing optimized code, you can enjoy some of the benefits of SIMD without having to devote the same amount of resources.

Let's be honest, this isn't a silver bullet - this isn't going to speed up code that doesn't use lots of floating-point vectors anyway. But if it does... (nearly) free performance is always a good thing.

Depends on what you are doing (5, Insightful)

dsci (658278) | more than 9 years ago | (#11597755)

We write code for hardcore chemical simulations. The limits on what can be studied, ie number of atoms/molecules or timescales of the simulations depends on one thing: speed.

Faster computers means better simulations. BUT, if the code is not as fast as it can be on a particular architecture, your simulations are not going to be as complete as they can be. At least within a given time allotment.

I've recently applied some code optimizations to a Monte Carlo simulation and saw speed ups of over 1000x. That's significant.

It's naive to think that faster computers means we should live with sloppy or unoptimized code. SIMD is a useful technique, and if it means the difference between me getting work done in a week or two or three weeks, I think I'll take the one-week sim.

License issues (5, Informative)

IO ERROR (128968) | more than 9 years ago | (#11597404)

Be careful; the "open source" license [pixelglow.com] (PDF) is not GPL-compatible. I don't even think it's BSD-compatible on first reading.

The Reciprocal Public License requires you to release all of your source code if you link to this library, even if your project is personal or used in-house only.

Re:License issues (2, Interesting)

voxlator (531625) | more than 9 years ago | (#11597555)

True, but only if you don't purchase a license.

Simple to understand; if you use it for free, you're expected to release your source code (i.e. the 'reciprocal' part of RPL). If you pay to use it, you don't have to release your source code.

--#voxlator

Re:License issues (3, Informative)

IO ERROR (128968) | more than 9 years ago | (#11597702)

Simple to understand; if you use it for free, you're expected to release your source code (i.e. the 'reciprocal' part of RPL). If you pay to use it, you don't have to release your source code.

True enough, but using the proprietary license makes it impossible to use this in existing projects without changing the license. Suddenly your open source project is either no longer open source, or doesn't look so attractive.

One of the nicest features of the GPL (and, to be fair, of the BSD license) is that you do not have to release source code if you don't distribute your software. This RPL requires you to release your source code even if you don't distribute your software. And the proprietary license simply isn't appropriate for any type of open source project.

The guy wants to get paid, and that's fine, I want to get paid, too. But he's got no business telling me I have to distribute my source code for an internal project that will never be distributed. He could easily have used a method similar to Trolltech's dual-licensing [slashdot.org] , but he chose instead to do something a whole lot more obnoxious.

License issues-Smells funny. (0)

Anonymous Coward | more than 9 years ago | (#11597810)

"The guy wants to get paid, and that's fine, I want to get paid, too. But he's got no business telling me I have to distribute my source code for an internal project that will never be distributed."

He does if you use his license willingly.

"He could easily have used a method similar to Trolltech's dual-licensing, but he chose instead to do something a whole lot more obnoxious."

It would be obnoxious if somehow he took away your free will to choose what license to use. He didn't and you can pick a multitude of OSI licenses.

Re:License issues-Smells funny. (1)

IO ERROR (128968) | more than 9 years ago | (#11597833)

Of course he hasn't taken away my choice, AC. I can't reconcile either of his licenses with my existing projects, so I choose not to use his code. I suspect many existing projects will find themselves in a similar situation when they actually read the licenses, and will also choose not to use his code.

Stupid license (0)

Anonymous Coward | more than 9 years ago | (#11597600)

Its a stupid license because its impossible to enforce. Its like trying to enforce a peculiar moral code with a EULA.

Note that I'm a fan of the GPL, and I think the aim of it is entirely in concert with the type of rules programmers have followed for over 40 years.

Re:License issues (0)

Anonymous Coward | more than 9 years ago | (#11597726)

requires you to release all of your source code
It sounds like the GPL virus to me.

Re:License issues (2, Informative)

IO ERROR (128968) | more than 9 years ago | (#11597876)

It sounds like the GPL virus to me.

Look, a troll! The GPL doesn't require you to release your code, unless you distribute it. This RPL thing requires you to release your code, even if you don't distribute it. I've discussed the linking issue elsewhere.

Re:License issues (0)

Anonymous Coward | more than 9 years ago | (#11597812)

I would suggest getting an IP lawyer to look at the license in your application. The FSF definition of "derivative work" appears to drive their negative outlook on this license.

If you have an existing work that you can optionally combine with the RPL licensed software, it is unlikely that a court would consider your existing work a derivative of the RPL software.

Of course, the ethical thing is to not use the software because such use would be against the wishes of the author.

Re:License issues (1)

IO ERROR (128968) | more than 9 years ago | (#11597920)

If you have an existing work that you can optionally combine with the RPL licensed software, it is unlikely that a court would consider your existing work a derivative of the RPL software.

With C++ templates this is a very thorny issue. When your code instantiates the template, the library code is very inextricably an integral part of your code, and not easily (if at all) separable. This might be a different issue if it were a C library you could just call through an API.

Currently under the GPL/LGPL this situation requires a special exception in the template library's license.

Re:License issues (1)

RupW (515653) | more than 9 years ago | (#11597902)

The Reciprocal Public License requires you to release all of your source code if you link to this library, even if your project is personal or used in-house only.

IANAL, but I read the intent as "if you improve macstl you have to publish your changes to macstl" not "if you link macstl you have to publish source to the entire project".

Obviously I can't say which one matches the legalese.

Talk about incoherent postings (0, Redundant)

Dikeman (620856) | more than 9 years ago | (#11597456)

I take pride in the fact that i didn't understand a word of this post.

About the RPL (4, Informative)

pavon (30274) | more than 9 years ago | (#11597495)

The RPL ( Reciprocal Public License [pixelglow.com] ) is an odd choice for this project. It is an even stronger viral copy-left than the GPL, to the point where the FSF takes issue with it. If create a derivative work you are required required to 1) Notify the original author, and 2) Publish your changes even if you only use the program in house. Furthermore, their definition of derivative work is much, much broader than the "linking" definition that the GPL uses.

The fact that it puts these additional requirements / restrictions on the user makes it incompatible with the GPL. In fact, considering the requirements placed on you by the license, I would expect that you will have difficulty incorporating this RPL library into any existing FLOSS project without running into license conflicts. The only thing I can see this being useful for is a new project that you don't mind releasing under the RPL, or with existing BSD style licensed code which you dual license as BSD/RPL (since BSD can be included in anything).

So this library does not appear to very useable for the FLOSS world, although if you want to license it for proprietary software you may.

Re:About the RPL (2, Informative)

geoffspear (692508) | more than 9 years ago | (#11597584)

Clearly, we need to get everyone in the world to download the source, make one superficial change, and email the entire thing back to the original developer.

And what happens if the original developer dies? Is everyone prohibited from using his code until the copright runs out in 95 years, as they can't notify him of changes?

Re:About the RPL (1)

MenTaLguY (5483) | more than 9 years ago | (#11597891)

And what happens if the original developer dies? Is everyone prohibited from using his code until the copright runs out in 95 years, as they can't notify him of changes?

Yes, unless he has an identifiable successor-in-interest.

Re:About the RPL (0)

Anonymous Coward | more than 9 years ago | (#11597898)

> So this library does not appear to very useable for the FLOSS world, although if you want to license it for proprietary software you may.

Yes, clearly the world of dental hygene is not ready for such a radical license! I wonder if he means Free / Open Source Software, though what the additional "L" stands for is anyone's guess...

Re:About the RPL (1)

Baldrson (78598) | more than 9 years ago | (#11597909)

The fact that it puts these additional requirements / restrictions on the user makes it incompatible with the GPL.

It's no more incompatible than is a class that overrides a method of a superclass "incompatible" with that superclass. In this instance, the release "method" is more strict.

Black Art? Uh... (3, Interesting)

arekusu (159916) | more than 9 years ago | (#11597496)

"...the black art of assembly language magicians."

The nice thing about altivec is that it has a C interface. You don't have to use assembly!

Take a look at this Apple tutorial [apple.com] to see how easy it is.

Re:Black Art? Uh... (3, Funny)

Leo McGarry (843676) | more than 9 years ago | (#11597688)

Yes, I think the person who wrote the summary revealed a little more of his own ignorance than he meant to. I don't consider calling "vec_add" inside a loop to be a black art.

Re:Black Art? Uh... (1)

dsci (658278) | more than 9 years ago | (#11597837)

Also, the VectorC compiler by CodePlay [codeplay.com] is useful for using a C compiler that can generate SIMD for MMX, SSE and 3DNow!.
,br> But really, at the end of the day, what's so bad about assembly? I mean, if you inline only those (relatively small parts) you need to optimize, and let the C compiler handle all the symbol table stuff, it's not that bad. We're not talking about developing a full app, including GUI, in straight Assembly from scratch.

More source-distro goodness to follow? (1)

Progman3K (515744) | more than 9 years ago | (#11597516)

Does this mean we can expect source Linux distros to start taking advantage of this?

I know I'll sound like a wannabe leet for saying this, but I already really like my Gentoo workstation because it is a stage1 install (all from source), and I expect this will only make it even faster!

Yay!

Re:More source-distro goodness to follow? (1)

ykardia (645087) | more than 9 years ago | (#11597577)

If you are using Gentoo, there is an "icc" useflag that allows using the Intel Compiler for code that supports this. This compiler already automatically vectorizes your code to work with the Pentium SIMD units (SSE, SSE2 etc).

The speedup is probably not as the one you would get from hand-coded libraries, but it can be quite significant (certain things can run up to twice as fast from my experience)

Re:More source-distro goodness to follow? (1)

Lussarn (105276) | more than 9 years ago | (#11597871)

Any particuar ebuilds I can test this on?

Re:More source-distro goodness to follow? (0)

Anonymous Coward | more than 9 years ago | (#11597924)

GCC 4.0 (now in beta) can vectorize loops, etc.

Too expensive? (1)

saddino (183491) | more than 9 years ago | (#11597522)

Sounds great, but $2499 for a redistributable binary? Ouch.

Re:Too expensive? (2, Insightful)

voxlator (531625) | more than 9 years ago | (#11597632)

In the corporate world, is it more expensive than paying a developer to design, code, test, and maintain a home-grown version?

Once you've payed a $30 dollar/hour developer for 10 days work, you've forked out ~ $2,500...

--#voxlator

Re:Too expensive? (1)

saddino (183491) | more than 9 years ago | (#11597874)

If the question was "Do I hire my own programmer or buy this technology?" then you would be correct.

But, given this is an optimization and replacement for STL then the question is "Do I just live with STL, or buy this technology?"

In other words, it isn't an essential development cost, it's an extra (I imagine most interested parties already have shipping apps that use STL).

And at this price point, IMHO, I think the answer may be "if it ain't broke, don't fix it."

Slides about SIMD (2, Informative)

quigonn (80360) | more than 9 years ago | (#11597529)

A bit OT, but nevertheless quite interesting to read and it contains information about SIMD instruction sets other than just MMX/SSE: http://www.fefe.de/ccccamp2003-simd.pdf [www.fefe.de]

Assembly or C++? (1)

nagora (177841) | more than 9 years ago | (#11597536)

I'll take the Assembly Language, thanks. Especially on such a nice processor.

TWW

Re:Assembly or C++? (1)

nagora (177841) | more than 9 years ago | (#11597619)

Especially on such a nice processor as the PowerPC, that is. Sheesh.

TWW

Autovectorization being add in GCC 4.0 (5, Interesting)

shawnce (146129) | more than 9 years ago | (#11597543)

For those that don't already know is that autovectorization is being worked on for GCC by folks from IBM and others.

GCC vectorizatoin project [gnu.org] (site seem offline atm) but the abstract from a recent GCC summit [gccsummit.org] is up.

Autovectorization Talk (google html view of pdf) [216.239.57.104]

Re:Autovectorization being add in GCC 4.0 (0)

Anonymous Coward | more than 9 years ago | (#11597767)

Thanks for the information

A gentoo user that is going to unmask gcc 4.0 and test it :)

Re:Autovectorization being add in GCC 4.0 (1)

TedCheshireAcad (311748) | more than 9 years ago | (#11597900)

If you're serious about performance, use XLC. GCC is great if you're cheap, but it's kind of like putting monster truck tires on a Ferarri.

It's in the compiler (2, Informative)

Mad Hughagi (193374) | more than 9 years ago | (#11597557)

Vectorization (SIMD) is built into the Intel compiler. There is no need to hack in assembly as the compiler will do it for you. This is the case with most vendor supplied compilers, as they want to fully exploit their hardware functionality.

The problem is bringing this functionality to OS compilers, which as far as I know, there is not even an OpenMP (threading) implementation, let alone internal vectorization.

Re:It's in the compiler (1)

nonmaskable (452595) | more than 9 years ago | (#11597903)

It is built in but you don't automagically get full benefit unless you design your data structures and algorithms appropriately. In my case, I got no measurable benefit until I did a fairly extensive redesign.

Intel has a great book on performance tuning that has been extremely helpful, as has Intel's VTune.

Licensing scheme acceptable? (0)

Anonymous Coward | more than 9 years ago | (#11597568)

I had a look at their website and I am a bit sceptical about the licensing scheme. Why can't they just be upfront and GPL it?

already exists (2, Informative)

jeif1k (809151) | more than 9 years ago | (#11597603)

SIMD support already exists, in the form of C, C++, and Fortran libraries (usually, as a small part of larger numerical libraries), as well as in language constructs in languages like Fortran.

Re:already exists (1)

jkujawa (56195) | more than 9 years ago | (#11597832)

The point of MacSTL is it's portable to both PPC and Intel. You can make a portable SIMD-optimized program.

The future (3, Insightful)

johnhennessy (94737) | more than 9 years ago | (#11597612)

Surely people can now start to see where the future lies - from a performance viewpoint. We've reached the end of the clocking "free lunch" (see http://www.gotw.ca/publications/concurrency-ddj.ht m/ [www.gotw.ca] ).

The way forward is turning the CPU (of a traditional) architecture into a Nanny for a range of various dedicated processing units. IBM saw this years ago, and thus began the whole Cell architecture - but I suspect that their job was much easier. The software that would run on the platform they are designing is fairly specific - games & multimedia which usually lend themselves well to vectorization.

The real challenge for architects (in my humble opinion) is translating will be applying the same technique to other system bottlenecks.

AMD's (and now Intel's) approach of crambing more and more processing cores onto an IC might pay off in the short term, but like the "free lunch" of clock speed, will hit a roadblock when issues like memory bandwidth and caching schemes just have too much work to do with 4 or 8 processing cores hacking at it all the time.

Isn't it what std::valarray is for? (1)

21mhz (443080) | more than 9 years ago | (#11597623)

Reading this reminded me about that portion of the standard C++ library which is all about operations on vector data. So, my question is: could an std::valarray specialization for processor-supported types serve as a basis for portable SIMD support in C++?

Re:Isn't it what std::valarray is for? (2, Insightful)

kuwan (443684) | more than 9 years ago | (#11597870)

So, my question is: could an std::valarray specialization for processor-supported types serve as a basis for portable SIMD support in C++?

That's exactly what this is. If you read the part on his website about valarray [pixelglow.com] then you'll see that it does extensive SIMD optimizations for valarray for both Altivec and MMX/SSE/SSE2/SSE3 platforms. He's even added "parallelized algorithms such as integer division, trigonometric functions and complex number arithmetic" which you'd have to code yourself in either assembly or using the C-based intrinsics if you wanted do the SIMD programming by hand.

So basically, this allows you to code using std::valarray using normal C++ and then plug this in under the hood to get a nice speed boost.

--
Join the Pyramid - Free Mini Mac [freeminimacs.com]

OS X Tiger will do it for you (2, Interesting)

jilbert (520628) | more than 9 years ago | (#11597722)

Tiger, the next OS release from Apple, will take care of vector optimization automatically [apple.com] in their version of gcc 4.0. I guess this will make it into the public gcc too.

Re:OS X Tiger will do it for you (1)

Junks Jerzey (54586) | more than 9 years ago | (#11597779)

Tiger, the next OS release from Apple, will take care of vector optimization automatically [apple.com] in their version of gcc 4.0. I guess this will make it into the public gcc too.

For the record, this has been in Intel's C compiler for years now. It's also in the current release of the Microsoft Visual C++ compiler, including the free download version.

Q for VMX/3D/OpenGL software developers: (1)

tubbtubb (781286) | more than 9 years ago | (#11597727)

This is public now, so I can talk about it--
I worked on extending the accuracy and continuity of the VMX instruction vexptefp, see the patent application here [uspto.gov]
My understanding is that this instruction is used to compute Phong/specular hilights, and that previous implementations of this instruction were unusable because the lack of accuracy and continuity made it visually undesirable. We were able to improve the algorithm enough to be visually indistinguishable from a fully accurate non-estimate.
Can any software developers that use this instruction comment on this?
Is Phong hilighting mostly done on GPUs now?

From the limewire... (3, Interesting)

WilyCoder (736280) | more than 9 years ago | (#11597731)

As two of my professors have stated in class, SIMD and moreso parallel processing will require programmers to think in a fundamentally different way in order for multi-core/multi-processor to really take off.

This project may be a step in the right direction. Benchmarks show that SIMD such as SSE/2/3 only provide a marginal speed increase. And meanwhile, the massively parallel computations done on graphics cards dwarfs anything SIMD claims to produce.

Perhaps we will see GFX manufacturers selling their technology to the CPU makers.

I forget the specifics, but a new GFX card can perform somewhere around 35 GFLOPS, while a 3.4Ghz P4(executing SIMD code) can only produce around 5-6GFLOPS at best.

With projects like Brook GPU emerging, the division of CPU and GFX processor may be narrowed significantly.

Ignorant submitter, or smart marketing? (2, Interesting)

javaxman (705658) | more than 9 years ago | (#11597746)

Sorry, I can't read a story submitted by someone who doesn't even know about C [apple.com] libraries [intel.com] that have been around for years.

Or is this just another advertisement pretending to be a story, with the submitter trying to play ignorant about alternative Altivec and MMX libraries ?

faster? Bogus.... (0, Troll)

Anonymous Coward | more than 9 years ago | (#11597796)

Excuse my ignorance, but how can a C++ template library be faster than hand coded assembler? ever.... no really - with a straight face. Given of course that "hand coded" implies it's hand coded for the task at hand an not something "like" it. If this was an article about a SIMD library why does it go all koolaid? Is this today's "mac-mini" astroturf?

Oh come on (0)

Anonymous Coward | more than 9 years ago | (#11597828)

It's not like there isn't C compiler intrinsics for MMX and SSE/SSE2/SSE3(PNI). Hell, far as I know, they're supported on both intel's and FSF's compilers.

I have to wonder: who the hell expects a library to turn out decent SIMD code for them? I mean, what the fuck's the matter?

liboil (2, Interesting)

labratuk (204918) | more than 9 years ago | (#11597901)

Another project trying to do something similar is liboil [schleef.org] , the Library of Optimised Inner Loops.

However in the future I can see things changing for the structure of the stardard PC.

At the moment in a high end machine you have the CPU, which is a scalar processor, a GPU, which is in essence a glorified vector processor (not just useful for graphics, as projects like GpGPU are showing us), and SIMD extensions to the CPU to allow it to do small amounts of vector processing.

Scalar processors are good for some things (branchy code) and vector processors are good for other things (very predictable parallel code). Having both is very useful.

I would say in the next 5-10 years we will see the GPU join together with the SIMD extensions to provide a seperate general purpose vector processor.

PCs will ship with two processors - one scalar, one vector. And everyone will be happy.

Now, whether this will be transparent to the programmer depends on how automatic code optimisation progresses over the next few years. Is Intel's icc auto vectorisation already good enough? Don't know.

Anyone else misread the title (0)

Anonymous Coward | more than 9 years ago | (#11597908)

Something along the lines of Grand Theft Auto: SIMS?
Load More Comments
Slashdot Login

Need an Account?

Forgot your password?