Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Comments

top

Ask Slashdot: What Is the Most Painless Intro To GPU Programming?

dsouth NPP (198 comments)

The easiest on-ramp to speeding up image/video processing is probably the npp library https://developer.nvidia.com/npp [nvidia.com] It has functionality and syntax similar to Intel's ipp library but uses an NVIDIA cuda-capable GPU to accelerate the operations.

If you want to dig in deeper you could explore OpenACC http://www.openacc-standard.org/ [openacc-standard.org] OpenACC is a directives based approach to accelerator programming. You comment or mark up your code with OpenACC directives that provide additional information that the compiler can use to generate parallel code.

Finally, you can learn CUDA C, or OpenCL, or CUDA Fortran, or NumbaPro, or one of the other programming languages that are supported on the GPU hardware of your choice. NVIDIA's CUDA C compiler is based on LLVM and the IR changes have been upstreamed to LLVM.org, There are several languages and projects in development that are leveraging the LLVM infrastructure to add GPU/parallel support.

[disclaimer: I work for NVIDIA, but the words above are my own.]

about a year ago
top

Ask Slashdot: What Is The Most Painless Intro To GPU Programming?

dsouth NPP or roll your own (3 comments)

The easiest on-ramp to speeding up image/video processing is probably the npp library https://developer.nvidia.com/npp It has functionality and syntax similar to Intel's ipp library but uses an NVIDIA cuda-capable GPU to accelerate the operations.

If you want to dig in deeper you could explore OpenACC http://www.openacc-standard.org/ OpenACC is a directives based approach to accelerator programming. You comment or mark up your code with OpenACC directives that provide additional information that the compiler can use to generate parallel code.

Finally, you can learn CUDA C, or OpenCL, or CUDA Fortran, or NumbaPro, or one of the other programming languages that are supported on the GPU hardware of your choice. NVIDIA's CUDA C compiler is based on LLVM and the IR changes have been upstreamed to LLVM.org, There are a lot of languages and projects in development that are leveraging the LLVM infrastructure to add GPU/parallel support.

[disclaimer: I work for NVIDIA, but the words above are my own.]

about a year ago
top

Lustre File System Getting New Community Distro

dsouth Re:Very first thing to do is... (68 comments)

It appears to be based on the linked site:

"In particular, ZFS’s advanced architecture addresses two of our key performance concerns: random I/O, and small I/O. In a large cluster environment a Lustre I/O server (OSS) can be expected to generate a random I/O workload. There will be 100’s of threads concurrently accessing different files in the back-end file system. For writes ZFS’s copy-on-write transaction model converts this random workload in to a streaming workload which is critical when using SATA disks. For small I/O, Lustre can leverage a ZIL placed on separate SSD devices to maximize performance."

The LLNL ZFS study has been pretty widely publicized in the HPC community. Lustre uses the filesystem API rather than mounting in. Until now Lustre used ext under-the-hood for data storage, so the performance improvement from ZFS is relative to ext. ext3/4 may very well outperform ZFS on a workstation or small server, but that's not the what Lustre is used for (even their test system is ~900TB).

Disclaimer: I used to work for LLNL.

more than 3 years ago
top

Folding@Home Releases GPU Client

dsouth Re:This is impressive, but... (177 comments)

FYI --

  1. SSE vectors are 128 bits -- that's two doubles, not eight. [There may be 8 sse registers, but that doesn't mean you can do 8 simultanous sse operations.]
  2. It's possible to extend precision using single-single "native pair" arithmetic. There's a paper by Dietz et al on GPGPU.org that discusses this.
This doesn't make GPUs capable of double-precision arithmetic, and doesn't mean they will replace CPUs. But it can be used expand the number of algorithms where the vast "arithmetic density advantage" of GPUs can be applied. Top-end CPUs can do 20-30 single-precision GFLOPS, GPUs have about 10x more GFLOPs in the fragment shader ALUs. That's alot of power if you can figure out how to make it work for your problem.

more than 7 years ago

Submissions

dsouth hasn't submitted any stories.

Journals

dsouth has no journal entries.

Slashdot Login

Need an Account?

Forgot your password?

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>