Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Ask Slashdot: What Is the Most Painless Intro To GPU Programming?

Soulskill posted about a year ago | from the large-reference-books-and-opiates dept.

Programming 198

dryriver writes "I am an intermediate-level programmer who works mostly in C# NET. I have a couple of image/video processing algorithms that are highly parallelizable — running them on a GPU instead of a CPU should result in a considerable speedup (anywhere from 10x times to perhaps 30x or 40x times speedup, depending on the quality of the implementation). Now here is my question: What, currently, is the most painless way to start playing with GPU programming? Do I have to learn CUDA/OpenCL — which seems a daunting task to me — or is there a simpler way? Perhaps a Visual Programming Language or 'VPL' that lets you connect boxes/nodes and access the GPU very simply? I should mention that I am on Windows, and that the GPU computing prototypes I want to build should be able to run on Windows. Surely there must a be a 'relatively painless' way out there, with which one can begin to learn how to harness the GPU?"

cancel ×


Sorry! There are no comments related to the filter you selected.

XNA or Unity (0)

MemoryParts (2989973) | about a year ago | (#44331893)

Since you already know C#, the most obvious answer is either XNA or Unity. With XNA you can code to both Windows and Xbox360, but Unity is ready-made game engine. Both let you write with C# and are powerful. With XNA you obviously get a bit more freedom, but Unity is great too. And both let you code using Visual Studio, woohoo!

Re:XNA or Unity (2)

Tr3vin (1220548) | about a year ago | (#44331945)

Those are game engines. They will do nothing to help him use the GPGPU capabilities of his graphics card.

Re:XNA or Unity (0)

i kan reed (749298) | about a year ago | (#44332113)

XNA has easy, painless shader compilation. You can plug a C# image class into an XNA texture, pipe it through a vshs shader that you write by hand, and dump the output to a texture, back to an image. That process is highly interoperable with existing C# applications.

But that ignores the fact that Microsoft abandoned XNA like an unwanted child.

Re:XNA or Unity (1)

gl4ss (559668) | about a year ago | (#44333169)

ms has a habit of abandoning one product and then other guys in the same fucking company forcing you to use xna.* libs on their brand spanking new hardware.

but actually that sounds like a possible solution for the guy, the pain being writing the shader.

silverlight abandoned? what the fuck are you doing shipping sdk with silverlight libs on almost the same fucking day?! I see though where elop learnt his trade.

Re:XNA or Unity (2, Informative)

Anonymous Coward | about a year ago | (#44332885)

Incorrect. That is certainly a valid approach and the GP should be modded up.

Using textures and shaders you can very easily do massively parallel floating point operations in XNA on the GPU, and it's a language the asker is familiar with.

Think outside the box a little bit.

Re:XNA or Unity (4, Informative)

stewsters (1406737) | about a year ago | (#44331983)

I don't think he is looking at making a game, I think he is looking for some cheap parallel processing. I have done some cuda, it was a pain to set up a few years back. There probably are better tutorials now.

Re:XNA or Unity (1)

vlueboy (1799360) | about a year ago | (#44333217)

Yeah, I know the feeling.
It would be one more tool under my belt. For instance, most non-financial people hear of unemployment numbers and a few know where to view the official data. For some bizarre reason the government offers no graphs at alongside their statistics, even though they let you download years worth of raw data. Enter us geeks, who easily put together a spreadsheet to make sense of official unemployment trends and zoom into the data all we want and run our won analysis.

One day knowing Opencl might let me to do similar processing that would otherwise be out of my reach. The potential alone has merit. Executing basic parallel programming without fear will yield a better accomplishment than the last multi-day experiment I ran on my GPU: mining up to one bit cent.

Re:XNA or Unity (0)

Anonymous Coward | about a year ago | (#44333619)

Even GLSL or HLSL are fine for an introduction to GPU processing. You won't be doing GPU bitcoin mining or any serious data tasks with it in the end, but it's fine for spreading out some of the work from your CPU.

Check out MC# (1)

Anonymous Coward | about a year ago | (#44331925)

I tried it out once a while ago just to see what it does. It looks 'dead' from a support POV, but it is still out there;

Release notes for MC# 3.0:
a) GPU support both for Windows and Linux,
b) integration with Microsoft Visual Studio 2010,
c) bunch of sample programs for running on GPU (including multi-GPU versions),
d) "GPU programming with MC#" tutorial.

GPU programming is pain (5, Funny)

Anonymous Coward | about a year ago | (#44331929)

GPU programming is painful. A painless introduction doesn't capture the flavor of it.

Re:GPU programming is pain (5, Funny)

PolygamousRanchKid (1290638) | about a year ago | (#44332317)

Yeah, it would be like S&M without the pain . . . cute, but something essential is missing from the experience.

Heidi Klum has a TV show call "Germany's Next Top Model". She basically gets all "Ilsa, She-Wolf of the SS" on a bunch of neurotic, anorexic, pubescent girls, teaching them how a top model needs to suffer.

Heidi Klum would make a good GPU programming instructor.

. . . and even non-geeks would watch the show. A win-win for everyone.

Re:GPU programming is pain (4, Funny)

Anonymous Coward | about a year ago | (#44332767)

Yeah, that's what we need! More neurotic, anorexic, pubescent girls who know how to do GPU programming!

Re:GPU programming is pain (1)

Anonymous Coward | about a year ago | (#44332765)

Only if the language you're using is pain. In other words: If you're trying to use C/C++/C#/Java/Pascal/⦠to write highly parallel code... YOU'RE DOING IT WRONG.

Those languages are not made for that. Don't try to shoehorn parallel programming onto them.

This is a far more elegant task in functional languages like Haskell, which are from ground up designed for parallel processing.

Then again, many programmers still sit in the tiny mental box of C & co, and think it's "the shit".
Yeah, for low-level code like drivers and memory managers, etc. But stop seeing nails everywhere just because you cling to the hammer as your only tool.

Re:GPU programming is pain (1)

Ken_g6 (775014) | about a year ago | (#44333051)

Only if the language you're using is pain. In other words: If you're trying to use C/C++/C#/Java/Pascal/⦠to write highly parallel code... YOU'RE DOING IT WRONG.

Those languages are not made for that. Don't try to shoehorn parallel programming onto them.

This is a far more elegant task in functional languages like Haskell, which are from ground up designed for parallel processing.

But GPU programming isn't just about parallel programming. It's also about low register availability, high memory latency, complicated memory access patterns, and just-plain-strange inter-process communication. The GPU has many more parts than a CPU, and you need to learn to use most or all of them effectively.

Re:GPU programming is pain (0)

Anonymous Coward | about a year ago | (#44333241)

CPU programming is about all of those things, too, though. Not least if you want to try to get any use out of an AVX unit. Try doing divergent branches or memory accesses across an AVX unit and see how you get on!

Re:GPU programming is pain (0)

Anonymous Coward | about a year ago | (#44333781)

Nice! I came here to say the same thing - It's hard to find a painful way to introduce yourself to pain :)

To be fair, the parallel programming part is useful to learn if you haven't done it much, otherwise there are a lot of details that aren't so interesting.

Learn OpenCL (5, Insightful)

Tough Love (215404) | about a year ago | (#44331939)

Since the whole point of GPU programming is efficiency, don't even think about VBing it. Or Pythoning it. Or whatever layer of a shiny crap might seem superficially appealing to you.

Learn OpenCL and do the job properly.

Re:Learn OpenCL (4, Interesting)

Tr3vin (1220548) | about a year ago | (#44331987)

Learn OpenCL and do the job properly.

This. OpenCL is C based so it shouldn't be that hard to pick up. The efficient algorithms will be basically the same no matter what language or bindings you use.

Re:Learn OpenCL (2)

Midnight Thunder (17205) | about a year ago | (#44332719)

Learn OpenCL and do the job properly.

This. OpenCL is C based so it shouldn't be that hard to pick up. The efficient algorithms will be basically the same no matter what language or bindings you use.

Well, the first thing is to understand parallel programming and what sort of things work well in a GPU. With that basic understanding, then OpenCL becomes a tool for doing that work. Starting with an OpenCL based "hello world" type application would then be the next step.

Re:Learn OpenCL (2)

SplashMyBandit (1543257) | about a year ago | (#44332907)

The real trick to efficient GPU programming is trying to keep as much in video memory as you can - by optimizing the textures you use (I'm a GLSL game developer, so this is *the* critical performance issue). I would also recommend OpenCL over CUDA. OpenGL has shown a longevity that made working with it worthwhile, and with billions of mobile devices using alongside PCs (Win/Linux) and Macs it seems that OpenCL could very well have the same longevity too. Since your time is a very precious thing it is worth investing that time in something that will be around for a long time and is be cross-platform (mobiles and tablets are the current fad, the browser with WebGL creating amazing apps could well be the next one).

As for libraries, I use the JoGL bindings for Java. That allows my application (a jet combat flight simulator in development) to work cross-platform with almost no porting effort. Using Java makes using lots of CPU cores easy, but the performance constraint is never the CPU, it is the GPU - so by using Java to save development time on routine stuff (heap-based resource management under multi-threading) and spend some of the saved time on optimizing the GPU code (which is the performance critical stuff).

Re:Learn OpenCL (0, Troll)

Anonymous Coward | about a year ago | (#44332033)

Surely the whole point of computer programming is efficiency - efficiency over doing a task without a computer.

If you can get the job done quicker in something along the lines of VB or Python and the speed up compared to using the CPU alone is good enough, I don't see why you shouldn't do it the easy way. Sure, if you're going to be doing this kind of coding a lot then you should invest time in learning the "best" way to do it, but if its something you'll seldom be doing then it may be more efficient for you just to take the easy option.

Re:Learn OpenCL (1)

Anonymous Coward | about a year ago | (#44332231)

Considering that GPU programming is intrinsically parallel in nature and pretty much none of those "easier" means really have the concept in question in their worldview, I call BULLSHIT on your line of reasoning.

Re:Learn OpenCL (2, Informative)

Anonymous Coward | about a year ago | (#44332273)

If you can get the job done quicker in something along the lines of VB or Python and the speed up compared to using the CPU alone is good enough, I don't see why you shouldn't do it the easy way. Sure, if you're going to be doing this kind of coding a lot then you should invest time in learning the "best" way to do it, but if its something you'll seldom be doing then it may be more efficient for you just to take the easy option.

Ordinarily I'd agree with you (programmer's time is worth more than anyone else's) but that means stopping now not even bothering with the GPU, since he already has code that works on the CPU. He's done. The project is complete. Next work order.

As soon as we start saying he's not already done, we've violated the principle and should stop trying to use it. His target is clearly end-user-enjoyed performance, and he's willing to put in more programmer time. So it's time to hang up the rapid prototype hat, and seriously get his hands dirty.

Re:Learn OpenCL (2)

Required Snark (1702878) | about a year ago | (#44332089)

Yep. Some things are intrinsically hard. GPU programming is SIMD programming [] , so you have to work with data parallelism. It helps a lot if you understand how the hardware works. This is where assembly language experience can be a big plus.

There's no substitute for detailed knowledge. Outside of instruction level parallelism, there is no "magic bullet" for parallel programming. Your have to learn things.

Re:Learn OpenCL (0)

Anonymous Coward | about a year ago | (#44332305)

One of the major difference of hardware engineers vs software "engineers" is that hardware people don't mess around with languages/frameworks etc. The only common languages are VHDL/verilog and may be System C.

The same language is good enough to testbench or synthesis their logic may it be a small CPLD/FPGA to large GPU/CPU/ASIC. They don't waste them playing with things to add more abstractions and slow things down. That's one of the many reasons why hardware has always ahead of software development.

Re:Learn OpenCL (4, Insightful)

HaZardman27 (1521119) | about a year ago | (#44332393)

That's because the closest analogy to a software engineer using a more abstracted language in the hardware world is the packaging of common circuitry. Or when hardware engineers design chips, do they actually model out the components of every single transistor?

Re:Learn OpenCL (4, Informative)

AdamHaun (43173) | about a year ago | (#44333133)

Or when hardware engineers design chips, do they actually model out the components of every single transistor?

Chip design is absurdly complicated (even on the digital side), and involves several layers of abstraction. In roughly increasing level of detail:

* Spec level: high-level behavioral description of the functionality of a digital system, something like "8-bit 115.2kbps UART" or "2MHz PWM with 0-100% duty cycle in 0.1% increments".
* HDL/RTL level: software-like description of the complete system design. Can range from higher-level (describing behavior) to lower-level (describing specific logic). When people talk about buying, selling, or creating "IP" in the chip design world, they're usually talking about RTL for a single functional unit.
* Gate level: Logic gates and flip-flops and their connections.
* Transistor level: The transistors that make up the gates, and their connections.
* Device level: The behavior of an individual transistor.
* Physical layout: Just what it sounds like; the actual arrangements of metal and silicon.

There are some more in between, but that should give you an idea. HDLs are not necessarily low-level. For large designs (like modern SoCs), it takes some *very* expensive and complex software to go deeper into the list, and the process is not entirely automated. So I wouldn't say hardware design can't be high-level. The difference is that in hardware, you always have to care about the lowest level when you're doing your high-level design, while in software you can take more things for granted. So even though a board-level design might just be a bunch of off-the-shelf chips hooked together, it still takes a lot of work to make sure everything comes out right.

Re:Learn OpenCL (0)

sl4shd0rk (755837) | about a year ago | (#44332131)

don't even think about VBing it. Or Pythoning it.

Awwwwww yisssssss... mothoafokin Assembly!

Re: Learn OpenCL (0)

Anonymous Coward | about a year ago | (#44332411)

Assembly is for pussies. REAL programmets code in straight binary.

01100101 thatshit

Re: Learn OpenCL (1)

Anonymous Coward | about a year ago | (#44332545)

Re: Learn OpenCL (1)

mwvdlee (775178) | about a year ago | (#44332781)

Patch cables? Are those the playfully colored, safety-blanket covered, plug-and-play things you kids use these days? Real programmers use a soldering iron and bare metal only.

Re: Learn OpenCL (1)

Anonymous Coward | about a year ago | (#44333137)

Enough said.

Re: Learn OpenCL (0)

Anonymous Coward | about a year ago | (#44333731)

Re:Learn OpenCL (4, Informative)

CadentOrange (2429626) | about a year ago | (#44332277)

What's wrong with a higher level language that interfaces with OpenCL? You're still writing OpenCL, you're just using Python for loading/storing datasets and initialisation. If you're starting out, something like PyOpenCL might be better as it'll allow you to focus on writing stuff in OpenCL.

Re:Learn OpenCL (0)

Anonymous Coward | about a year ago | (#44332289)

Rewriting a bunch of boilerplate and high level code to gain maybe 0.01% more speed? We don't know what specific algorithm he needs to use. If it just depends on a bit of simple arithmetic in a tight loop, then that is the only part that needs to be done in something specific to the GPU. And if the algorithm is simple enough, even the a crappy translator or compiler will get it to run quickly on a GPU. Wasting time writing the rest of the program in a language you have no interest or taste for will waste you more time than you recover for a lot of things these days. And for everyone complaining that it is difficult to learn parallel computation and that it is going to be painful no matter what, that is BS for many algorithms. Some are inherently and dead simple parallel, and if that is all you are going to do, there won't be any great difficulty getting a massive speed up from GPGPU. At worst you might have some cache issues that cost you a little, but still end up faster than doing it on the CPU.

It is not that OpenCL or learning the architecture and basics of parallel programming are bad advice. But the idea that those are the only paths that will go anywhere for all projects is bull.

Re:Learn OpenCL (-1)

Anonymous Coward | about a year ago | (#44332321)

What if the student is a heterosexual? What is he to do? Can't program in any faggot shit like OpenAnus or whatever the fuck you fucking fucks call it.
I swear that the default way of thinking around here is to assume that everyone who posts here want to have faggots shit on their faces and to suck cocks. The reality is that most people don't want to eat another man's shit and most men don't like sucking dicks. Just get over it.

Re: Learn OpenCL (-1)

Anonymous Coward | about a year ago | (#44332363)

Anonymous Cowards

Re: Learn OpenCL (-1)

Anonymous Coward | about a year ago | (#44332495)

Yeah. Because someone's real name is LongDongSilver or Obi_Wan or TheFakeSteveJobs.
While some users do have a reputation of note around here most of us have no investment here. If it would make you really feel better I'll start logging in as TrollBridge and you can have a name to associate with a troll. I'm sure it won't make a difference in the content.
People who bitch about using AC to troll probably don't understand the real reason most trolls use AC instead of an 'identity.'

CUDA (1)

Anonymous Coward | about a year ago | (#44331943)

CUDA is extremely easy to learn and use (if you know C and of course have an NVidia card) and is well worth the effort for some projects. Alternatively you could try skipping GPU programming and just using OpenMP which would still greatly improve performance if your not already multithreading.

Re:CUDA (2, Insightful)

Anonymous Coward | about a year ago | (#44332501)

Never under any circumstances use cuda. We don't need anymore proprietary garbage floating around. Use opencl only.

Re:CUDA (0)

Anonymous Coward | about a year ago | (#44332549)

Never under any circumstances use cuda. We don't need anymore proprietary garbage floating around. Use opencl only.

Maybe when OpenCL is as easy and quick to write as CUDA I'll do that, until then no thanks.

Re:CUDA (0)

Anonymous Coward | about a year ago | (#44333103)

Combine CLU ( with the C++ bindings and you get something pretty simple if you're willing to drop the single source feature of CUDA (and really single source is only a driver for template engines, beyond that programmers use multiple compilation units all over the place anyway). Basic CLU vector add sample is 20 lines of host code with simple tools that pull in the device code.

No loss of ease of use, nice gain in portability.

Re:CUDA (4, Informative)

UnknownSoldier (67820) | about a year ago | (#44332635)

Agreed 100% about CUDA and OpenMP! Already invented a new multi-core string searching algorithm and having a load of fun playing around with my GTX Titan combing CUDA + OpenMP. You can even do printf() from the GPU. :-)

The most _painless_ way to learn CUDA is to install CUDA on a Linux (Ubuntu) box or Windows box. []

On Linux, at the command line fire up 'nsight' open the CUDA SDK samples and start exploring! And by exploring I mean single-stepping through the code. The NSight IDE is pretty darn good considering it is free.

Another really good doc is the CUDA C Programming Guide. []

Oh and don't pay attention to the Intel Propaganda - there are numerous inaccuracies:
Debunking the 100X GPU vs CPU Myth: An Evaluation of Throughput Computing on CPU and GPU []

Re:CUDA (0)

Anonymous Coward | about a year ago | (#44333209)

I also recommend Udacity parallel programming class.

It really opened my eyes to parallel programming. You will want to learn some basic CUDA syntax before diving in. But you can complete the course without having your own development environment and do all the programming exercise directly from their web interface.

OpenACC (1)

Anonymous Coward | about a year ago | (#44331975)

don't know what the status is on Windows, but for high-performance computing, OpenACC is an emerging standard, with support by Cray and PGI compilers.

It's easier than it sounds (1)

Anonymous Coward | about a year ago | (#44332019)

The heavy lifting has mostly already been done for you. [] There are CUDA wrappers out there that, with a few changes to your code, run it as close to optimally as possible using the card's cores. We had a Nvidia guy come by and give a talk just to show off how relatively painless it is (similar to OpenMPI, in my opinion). If you've got a couple extra people around consider reaching out to Nvidia to have someone show everyone a few of the options.

Obsidian (4, Informative)

jbolden (176878) | about a year ago | (#44332103)

I get the impression that CUDA/OpenCL is still the best option. This thesis on Obsidian [] presents, a Haskell set of binding which might be easier and also covers the basics quite well. Haskell lends itself really well because the language inherently is designed for parallelism because of purity and out of order computation. That being said, I think Obsidian is a bit rough around the edges but if you are looking for a real alternative, this is one. (0)

Anonymous Coward | about a year ago | (#44332109)

It new and might be a little rough around the edges, but everything else is hacks on top of OEM property "solutions" on top of hardware hacks.

Re: (0)

Anonymous Coward | about a year ago | (#44332153)

And has nothing useful public software yet and only actually defines an intermediate language. What use would it be to direct someone who's asking for an easy route in straight to HSAIL?

Jitter (1)

handshake, doctor (1893802) | about a year ago | (#44332119)

Check out Max/MSP/Jitter [] .

As you describe, the interface is VPL - connecting boxes / nodes to access the GPU is one of the (many) things the program is capable of. Depending on what you're trying to, you may also find Gen [] useful for generating GLSL shaders within the Max environment (although you can use other shaders as well).

I'm currently neck-deep in a few Jitter projects using custom shaders, etc., and while it's great for rapid prototyping, getting good frame-rates and production stable code out is a whole black art unto itself. Fortunately, the support and forum community are very strong.

GPU programming *is* pain, princess. (4, Informative)

Chris Mattern (191822) | about a year ago | (#44332133)

Anyone who tells you differently is selling you something.

Re:GPU programming *is* pain, princess. (1)

Em Adespoton (792954) | about a year ago | (#44332775)

Anyone who tells you differently is selling you something.

Works well for CUDA anyway....

Udacity teaches CUDA (2)

Arakageeta (671142) | about a year ago | (#44332151)

Check out the Udacity class on parallel programming. It's mostly CUDA (I believe it's taught by NVIDIA engineers): []

CUDA is generally easier to program than OpenCL. Of course, CUDA only runs on NVIDIA GPUs though.

C++ AMP (0)

Anonymous Coward | about a year ago | (#44332159)

It is Microsoft, but have you looked at C++ AMP?

OpenACC (4, Interesting)

SoftwareArtist (1472499) | about a year ago | (#44332161)

OpenACC [] is what you're looking for. It uses a directive based programming model similar to OpenMP, so you write ordinary looking code, then annotate it in ways that tell the compiler how to transform it into GPU code.

You won't get as good performance as well written CUDA or OpenCL code, but it's much easier to learn. And once you get comfortable with it, you may find it easier to make the step from there into lower level programming.

Re:OpenACC (2)

140Mandak262Jamuna (970587) | about a year ago | (#44332939)

It works in theory. In practice, unless you understand your code well, and the way compiler built the instructions well, and understood what these directives very well, you wont get any speed improvements. There are times when the over heads slow down the code and the simple minded implementation had brain dead locks, and you end up with slower code.

We have come a long way since the days of assembly and assembly in another name Fortran. But the overheads of the higher level languages have been masked a lot by the ever increasing speed and memory availability. Whole generations of programmers have come up, higher level languages with IDE and CASE tools from day one they fundamentally don't understand how the code actually works. They are continually stumped by the fact the code does what they tell it to do, not what they meant it to do.

Re:OpenACC (2)

SoftwareArtist (1472499) | about a year ago | (#44333085)

True, and this is even more true on GPUs than CPUs. They do a lot less to shield you from the low level details of how your code gets executed, so those details end up having a bigger impact on your performance. And to make it worse, those details change with every new hardware generation!

But for a new user just getting into GPU programming, it's easier to learn those things in the context of a simple programming model like OpenACC than a complicated one like CUDA or OpenCL. That just forces them to deal with even more complexity and hardware details right from the very start. OpenACC can produce good results if used well. And once you've learned to do that, you're in a better position to tackle the harder technologies.

Very Similar Story (2)

Chaseshaw (1486811) | about a year ago | (#44332163)

VB.NET background. Wanted to get into GPGPU to accelerate some of my more complicated math calculations. Tried CLOO (open source .net GPU wrappers) and couldn't get it to work, tried AMD's OPENCL dev gui, couldn't get it to work. Eventually found the answer in python. GPGPU in pyopencl is well-documented thanks to the bitcoiners, and from .net you can either run the python in a shell, or write a little python kernel to listen for, and process commands. Only catch is the opencl abilities are limited, and you have to start dabbling in c++ to get it to do any real work (and even then it's a dumbed-down c++ and many existing extensions don't install or work quite right). All in all I found the entire thing very rewarding though. :) Best of luck.

Learn OpenMP (0)

Anonymous Coward | about a year ago | (#44332177)

Learn about parallel programming with OpenMP, which you can run on your normal machine. If you take enough time to do that properly then the OpenMP standard will also support GPUs, and the move to such architectures will be easy. (-1)

Anonymous Coward | about a year ago | (#44332189)

End, we nee3 you under the GPL. 4.1BSD product, between each BSD being GAY NIGGERS. hobby. it was all Cans can become

Coursera has a great course. (0)

Anonymous Coward | about a year ago | (#44332213)

Heterogeneous parallel programming. It cuts it. In a few lessons you will know where you are heading.

Proper approach to GPU programming (1, Insightful)

godrik (1287354) | about a year ago | (#44332217)

Like in all attemps at getting stuff faster, you should first wonder what kind of performance you are already getting out of CPU implementation. Provided you seem to believe it is actually possible to get performance out of a VB like langage, I assume that your base implementation heavily sucks.

Putting stuff on a GPU has for only goal to make things faster but it is mostly difficult to write and non portable. Having a good CPU implementation might just be what you need. It also might be easier for you to write.

If you really need a GPU, then you need to start learning how GPU works, because a simple copy paste is unlikely to give you any significant performance. A good start at: []

I never properly learned opencl, but it is essentially similar. Except you have access to less low level details on nvidia architecture. Of course, cuda is pretty much nvidia only.

Re:Proper approach to GPU programming (-1)

Anonymous Coward | about a year ago | (#44332805)

If you think performance isn't possible in c#.Net, you have no idea what you're talking about, especially with multi-threaded applications. The garbage collector allows techniques such as copy-on-write which are very difficult to impossible to implement in non garbage-collected languages. You might have had a point in 2003, but today, you're just ignorant.

C++ AMP (1)

VertigoAce (257771) | about a year ago | (#44332219)

Take a look at C++ AMP [] . It is a small language extension that lets you target the GPU using C++. The platform takes care of most of the mechanics of running code on the GPU. Also check out this blog post [] for links to tutorials and samples.

it ain't (0)

Anonymous Coward | about a year ago | (#44332229)

barraCUDA because that'll eat your motherfucking ass alive man!

Coursera (2)

elashish14 (1302231) | about a year ago | (#44332293)

Coursera has some courses on GPU programming, like this one [] , and what's nice about them pretty slow, and I'm assuming that they explain things well. Other online courses probably offer the same, and I think the video lectures would be helpful in understanding the concepts.

Re:Coursera (0)

Anonymous Coward | about a year ago | (#44333671)

CUDA yuck. Last time I checked, it was lying about OpenCL

Nitrous Oxide (0)

Anonymous Coward | about a year ago | (#44332295)

LOTS of it.

OpenCV (1)

SpinyNorman (33776) | about a year ago | (#44332315)

Try Intel's free OpenCV (Computer Vision) library, which includes GPU acceleration.

Re:OpenCV (0)

Anonymous Coward | about a year ago | (#44332559)

Try Intel's free OpenCV (Computer Vision) library, which includes GPU acceleration.

Came here to say this. OpenCV has CUDA support (with the gpu:: module) and OpenCL support (with the ocl:: module). GPU support isn't complete (not everything is implemented) but it's actively being developed (it was only CUDA until recently).

You might think that you are clever rolling your own image processing algorithms, but chances are, the OpenCV people have done a better job. OpenCL has bindings for Python, but I've only ever used it in C++ projects.

Re:OpenCV (0)

Anonymous Coward | about a year ago | (#44333399)

"Quit your research"

Yeah, real handy advice.

Nothing easy but Udacity can help (5, Informative)

Jthon (595383) | about a year ago | (#44332381)

So there's nothing really easy about GPU programming. You can look at C++ AMP from Microsoft, OpenMP or one of the other abstractions but you really need to understand how these massively parallel machines work. It's possible to write some perfectly valid code in any of these environments which will run SLOWER than on the CPU because you didn't understand fundamentally how GPUs excel at processing.

Udacity currently has a fairly decent intro course on GPU programming at: []

It's based around NVIDIA and CUDA but most of the concepts in the course can be applied to OpenCL or another GPU programming API with a little syntax translation. Also you can do everything for the course in your web-browser and you don't need an NVIDIA GPU to finish the course exercises.

I'd suggest running through that and then deciding on what API you want to end up using.

Other option (0)

Anonymous Coward | about a year ago | (#44332423)

Consider the Intel image processing libraries. They have a broad range of routines that are highly optimized for their processors.

not TOO hard (0)

Anonymous Coward | about a year ago | (#44332445)

If you know multithreading concepts, OpenCL isn't too hard to get into.
Ofcourse, start small, do tutorials, and do it right.

Much much much easier than trying to do stuff in pixel shader, or ,even worse, the assembly like shading language that came before GLSL.

Understand The Hardware (3, Informative)

Anonymous Coward | about a year ago | (#44332585)

If you are going to program a GPU, and you are looking for performance gains, you MUST understand the hardware. In particular, you must understand the complicated memory architecture, you must understand the mechanisms for moving data from one memory system to another, and you must understand how your application and algorithm can be transformed into that model.

There is no shortcut. There is no magic. There is only hardware.

If you do not believe me, you can hunt up the various Nvidia papers walking you through (in painful detail-- link below) the process of writing a simple matrix transpose operation for CUDA. The difference between a naive and a good implementation, as shown in that paper, is huge.

That said, once you understand the principles, CUDA is relatively easy to learn as an extension of C, and the Nvidia profiler, NVVP, is good at identifying some of the pitfalls for you so that you can fix them.

OpenACC or OpemMP 4.0 are exactly what you want (5, Informative)

John_The_Geek (1159117) | about a year ago | (#44332637)

I teach this stuff daily, and the huge advance over the past year has been the availability of OpenACC, and now OpenMP 4, compilers that allow you to use directives and offload much of the CUDA pain to the compiler.

There is now a substantial base of successful codes that demonstrate that this really works efficiently (both development time and FLOPS). S3D runs at 15 PFLOPS on Titan using this and may well win the Gordon Bell prize this year. Less than 1% of lines of code modified there. NVIDIA has a whole web site devoted to use cases.

I recommend you spend a day to learn it. There are regular online courses offered, and there is a morning session on it this Monday at XSEDE 13 if you are one of those HPC guys. A decent amount is available online as well.

BTW, with AMD moving to Fusion, the last real supporter of OpenCL is gone. NVIDIA prefers OpenACC or CUDA and Intel prefers OpenMP 4 for MIC/Phi. So everyone officially supports it, but no one really puts any resources into it and you need that with how fast this hardware evolves.

CUDAfy.NET (0)

Anonymous Coward | about a year ago | (#44332665)

I've heard decent things about CUDAfy.NET [] .

Learn to Program an Intel Phi instead (1)

quarkie68 (1018634) | about a year ago | (#44332687)

The only painful thing you have to do is to decide how to increase threading in your code.

Re:Learn to Program an Intel Phi instead (1)

TechyImmigrant (175943) | about a year ago | (#44333919)

Yes. This.

60 independent cores with general purpose instruction set on the same die with fast interconnect. If you need to pack some parallel speed on and do real work, using a GPU is pissing in the wind. An Intel Phi lets you get the job done.

GPUs do certain things very well, but the odds of your problem mapping well to GPUs is slight.

Do you need the GPU? (2)

jones_supa (887896) | about a year ago | (#44332693)

You would probably see a multi-fold increase in performance by simply converting your project from C# to C++.

Re:Do you need the GPU? (1)

greg1104 (461138) | about a year ago | (#44333583)

Possibly [] , but there are a lot of tasks that only see about a doubling of speed. A C++ port is only likely to speed things up, while a GPU one is certain to. (Presuming the assumption about parallel execution is correct)

GPU Maven Plugin (1)

Anonymous Coward | about a year ago | (#44332721)

Closest to painless I know of is

The GPU Maven Plugin compiles Java code with hand-selected Java kernels to CUDA that can run on NVIDIA GPUs of compatibility level 2.0 or higher. It encapsulates the build process so that GPU code is as easy to build with maven as ordinary Java code. The plugin relies on the NVidia CUDA SDK being installed which must be done separately.

Sorry but people here are full of crap (0)

Anonymous Coward | about a year ago | (#44332723)

Use c# and Microsoft Accelerator. []

It's very easy to use, and since the VAST majority of your processing is going to occur on the GPU, the language you use is mostly irrelevant.

The main thing you need to be aware of is that the bus to the video card is very, very, very slow. So in order to get any speedup from the GPU, you'll need to send as much stuff to be processed to the video card as you can. Round-trips hurt you a lot, so minimize them any way you can get away with doing so.

OpenSceneGraph or OGRE (1, Interesting)

bzipitidoo (647217) | about a year ago | (#44332745)

I went with OpenSceneGraph.

Long ago, I tried xlib only, because at that time Motif was the only higher layer available, and it was proprietary. It was horrible. xlib has been superceded by XCB, but I wouldn't use that, not with all the other options out there today. XCB is a very low level graphics library, for drawing lines and letters in 2D. 3D graphics can be done with that, but your code would have to have all the math to transform 3D representations in your data into 2D window coordinates for XCB. LessTif is a free replacement for Motif, but by the time it was complete enough to be usable, the world was already moving on. With Wayland likely pushing X aside in the near future, XCB and xlib may not perform so well. They will continue to be supported for a while through a compatibility layer, but I think they're on the way out. Motif is also not much good these days either. For one, Motif rests on top of xlib, and if xlib goes, so does Motif. Today, we have many better libraries for interfacing with GUIs.

When OpenGL became available, I tried it. OpenGL is great for drawing simple 3D graphics, but it lacks intelligence. The easy part is that you just pass x,y,z coordinates to the library routines, and OpenGL does the rest. The bad part is that if you want to draw a fairly complicated scene, containing many objects that may be partly or completely hidden behind other objects, OpenGL has no intelligence to deal with that. It just dumbly draws everything your code tells it to draw. To speed that up, your code has to have the smarts to figure out what not to draw, so it can skip calling on OpenGL for invisible objects.

That's where a library like OpenSceneGraph comes in. Your code feeds all the info to OSG. OSG figures out visibility, then calls OpenGL accordingly.

You may need still other libraries for window management, something like FLTK. Yes, FLTK and OSG can work together.

You will also most likely be working in C/C++. OpenGL has many language bindings. But OSG is C++ and doesn't have so many. FLTK is also C++, and has even fewer bindings. Trouble with picking a language like Python for this work is that it can be difficult to find bindings for all the libraries. Even when bindings to a particular language exist, they tend to be incomplete, and don't always perfectly work around differences in data representation. Pick libraries first, then see what language bindings they all have in common, then code in one of those common languages. It's possible C/C++ will turn out to be the only language common to all the libraries.

Re:OpenSceneGraph or OGRE (1)

bzipitidoo (647217) | about a year ago | (#44332837)

Gah, should have read the summary more carefully. I was talking about 3D graphics, not general programming on the GPU.

Re:OpenSceneGraph or OGRE (0)

Anonymous Coward | about a year ago | (#44333157)


try Theano (1)

Anonymous Coward | about a year ago | (#44332801)

You could give Theano a try. It's a python based symbolic expression compiler which interface is very much like numpy. I use it on Linux but I've heard mention of support for Windows.

Rootbeer (0)

Anonymous Coward | about a year ago | (#44332965)

I admit I don't know much about GPU programming.
But if I were you, I'd take a good look at the rootbeer compiler, which translates Java code into CUDA or OpenCL

It sure looks simple and Java is just a small step from C#.

Image Processing DSL (1)

Anonymous Coward | about a year ago | (#44333011)

Look at MIT's Halide it's a domain specific language for image processing.

The alternative is OpenCL/CUDA, which require in-depth knowledge of the H/W to get the best from the GPU. It doesn't matter whether you use Python or whatever bindings you choose for a GPU native language. The hardest part is mapping the algorithm to the H/W model of a GPU. PyCUDA does NOT solve that issue.

You can get plenty of help from Stackoverflow.

No (-1, Troll)

Coeurderoy (717228) | about a year ago | (#44333041)

No there is not "easy" way, you have to learn stuff, and since you "need" to be on Windows, you demonstrated that you are not the kind of person who is interested in understanding how things work, so you'll probably fail.
Meanwhile you should use CUDA, since you are allready bowing to one overlord, why not go on and dig yourself deeper into vendor lock-in.

People who actually want to do some thing either are using openCL if they need some results in a reasonable time and do not plan to be trapped into a specific HW technology.

If I'd just want to try out ideas I'd probably use Rust or Harlan, Rust is more Cish Harlan is Schemish..
And the real issue is to think of the application as a conjunct of the GPU and the CPU, and remember Amdhal's law...

Mary Hall at The University of Utah (1)

TwineLogic (1679802) | about a year ago | (#44333143)

I wouldn't call her advanced coursework easy, but a resource that belongs on this thread: []

Mary Hall is a professor of Computer Science. Her recent work is related to compilers and parallel programming on GPUs. Her professional web page is something like an on-line open course, or the framework of one.

Take it from someone who's done a lot of CUDA (1)

mathimus1863 (1120437) | about a year ago | (#44333145)

There isn't really a painless way. Like a lot of skills in life, the only way to learn is through pain, suffering and frustration. But it makes the prize all the much more enjoyable. You need to be experienced at regular, serial programming in C/C++, then mangle all of it to figure out how to program in parallel. I literally read the CUDA programming's guide 5 times. And I felt like I gained as much on the fifth time as I did the first time. And don't expect your debugger to save you -- if it's like it was a year ago, you're going to struggle a bit with that.

Luckily, once you do get it, it all seems to make sense in hindsight. And when you do achieve that 10x-300x speedup, you'll feel like a superhero. You just have to be patient and expect some frustration. It's not like learning a new programming language. It's like a whole new programming paradigm.

Ask a neck-beard... (1)

OhSoLaMeow (2536022) | about a year ago | (#44333253)

... to code it in COBOL for you.

I recommend CUDA if... (1)

Anonymous Coward | about a year ago | (#44333391)

I recommend CUDA if you can deploy requiring NVIDIA hardware. CUDA allows for pre-compiled kernels, CUDA has a debugger for your kernels, CUDA has a tool chain. CUDA has far richer options. Indeed, NVIDIA uses LLVM for it's CUDA compilers so in theory different programming languages can be used to write CUDA kernels. Take a gander at:

In contrast, OpenCL is somewhat barbaric. It is an API and there are very few tools for it. Worse, OpenCL implementation can be all over the map.

You do NOT need to use or for that matter use OpenGL to use CUDA or OpenCL. The interop APIs between OpenGL and OpenCL or CUDA are to make buffer transfers efficient between the two (so that one can compute something with CUDA or OpenCL and have it drawn with OpenGL).

CUDA (0)

Anonymous Coward | about a year ago | (#44333517)

was going to suggest openGL or DirectX but i think the poster wanted a general programming language.

not sure if this is helpful but i found a website about CUDA for video cards at:

i don't think CUDA programs will work on my AMD video card though. lol it'll be cool if i could create a program that uses the 800MHz GPU and DDR3 VRAM just for fun.

C or C++ with vectors (1)

gnasher719 (869701) | about a year ago | (#44333735)

OpenCL or CUDA is a real pain, and a lot to learn. But any modern Intel quad core processor can deliver 50 billion floating point operations per second if you treat it right.

Use C or C++ with the Clang compiler (gcc will do fine as well probably) and vector extensions. Newer Intel processors have 256 bit vector registers, so you can define vector types with 32 8-bit integers, 16 16-bit integers, 8 32-bit integers or 8 single precision floating point numbers, or 4 double precision floating point numbers. You can do two operations with such vectors per cycle if you take care about latency. And on a more expensive processor, you can run 8 threads in parallel.

If 50 billion floating point operations per second is enough, then you're fine. And if you can't manage to produce 50 billion FLOPS/sec in C or C++, then you don't even need to try OpenCL.

C# (0)

TechyImmigrant (175943) | about a year ago | (#44333841)

>I am an intermediate-level programmer who works mostly in C# NET.

I am so very, very sorry. I hope you find a better job soon.

Write some graphics shaders and multithreaded prog (1)

fatgraham (307614) | about a year ago | (#44333855)

I've just started with opencl and love it, it's fast, easy, debuggable (codel) and -with stable drivers- not too much of a pain when it goes wrong.

I've been writing hlsl, glsl and arb vertex shaders for years and to me, opencl kernels are basically the same thing (language and limitation wise). Convert some full screen graphics effects to opencl for a first example, then make it do other stuff (maybe with buffers instead of images).

Once you're used to making/debugging kernels, start splitting code/algorithms into smaller chunks, and start parallelising!

Once it works, start digging into specific opencl/cuda stuff (local vs global memory etc) to start optimising

Load More Comments
Slashdot Login

Need an Account?

Forgot your password?