What hands-on workshops would you like to see at future events?

We’re always looking to run training sessions either independently or at specific events. What topics would you be interested in? What languages would you like to see used? Do you enjoy the hands-on format? How about “bring your own computer” or would you rather hvae systems provided?

Why not just show the time difference between a CPU implementation and a CUDA implementation.

Compare sorting a large array (over 2^23) of floats using STL::sort() vs. the same array sorted via thrust::sort(). If done right(using pointers) there can be a 30x difference, and thrust::sort() is easy to implement.

Then do the same with cuBLAS Sgemm() vs a serial CPU implementation. The difference in running times can really be huge in that case(1000x).

Those libraries come with the SDK, so might as well illustrate the performance benefits, and the way to implement from C or C++.

CUDA could also be presented as a way to ‘supercharge’ your existing code base, by using the GPUs for tasks like sorting, Linear Algebra operations, and image processing. Many people will not be writing their own kernels, rather use the libraries to get an overall speedup from their existing code.

As far as developers go, show them clear source code for CPU implementation and GPU implementation.

Much of the work I have been doing relates to converting Matlab code into CUDA code, so pay attention to that as well.

Hi - I’m with MathWorks and I’d be interested in hearing more about your workflow of using MATLAB with CUDA. I have heard from several customers who prototype their algorithms in MATLAB, and then incrementally develop CUDA kernels to replicate sections of their MATLAB code. And in the process they use their original MATLAB prototype for testing their kernels.

I discuss this more in a joint webinar that I recently delivered with NVIDIA, called ‘MATLAB for CUDA Programmers’


I’d be curious to learn how this workflow compares to what you’re doing.

Dan Doherty

That is pretty much what I do as well. Usually it is Matlab code which has a number of Linear Algebra operations, and they need a speedup which performs the same solver/algorithm.

I will always write a serial CPU version first of the Matlab code so I can benchmark the GPU version and verify the results. Also it helps me plan how I am going to adapt the C code to CUDA, in terms of the parallel architecture.

cuBLAS is used as much as possible, as I have had good experience with that library, and have been using cuSPARSE more lately since researchers love that Matlab sparse format.

Ugghh, writing a version for spdiags() was a pain.

The trickier aspects of such projects is how to write an efficient CUDA version of Matlab code such as;

[Q,L] = eig(rho*(Z - U) - S);


R = chol(P + rhoeye(n));
x = R \ (R’ \ (rho
(z - u) - q));

but I have been able to get it done.

This is the type of work which really benefits from the fast CUDA Matrix-Matrix and Matrix-vector operations, and I am glad there are such workshops.

How about a training session about how to program a HPC like Titan.
The MPI/ cuda mix that is not that easy but there is very little documentation about
Is an advance topic but students are always interested (mainly because they want to do an internship in one of these centers)

Agreed that’s a great idea for a workshop mpc. I’ll see about creating the materials and getting them online for instructors to use.