Is CUDA for me? Matlab alone is too slow.

I have two independent tasks I need completed, and I was wondering if either would be a good candidate for a CUDA solution. I have no knowledge of opengl or graphics programming, but I have spent the past day looking into how they work and have a strong knowledge of programming in general.

As I see it I have 4 options for each of my two tasks:

  1. Leave it in Matlab
  2. Port it to C for some speed gains
  3. Use some combination of C and opengl (not sure how this works, it was an option others have suggested to me)
  4. Use something like CUDA and run it on the graphics card

Any advice on what would be best would be greatly appreciated. Reasoning as to why and any tips at where to start would also be appreciated.

Task 1:
I have a voxel(pixels in 3 dimensions) space I need to fill with values. Each voxel can be computed independently so I think this is a good candidate for parallelizing. The total voxel space will no doubt be too large to store at once, so can parts be periodically written out to a file? The equation for each voxel is of the form: output[y][z] = log(constant* sum_over_all_i( e^(distance_between_x-y-z_and_each_of_the_i_points))) where the i points is a constant list of 3d points.

Task 2:
Given a voxel space I need to rotate it some theta degrees around a vector v. This is where someone pointed me towards opengl, because I do not even have a good solution for this in matlab yet. The problem is that when rotating by anything other than 90,180,etc. degrees, the end locations of each voxel are not integers so an interpolation needs to be done. This is a common problem with image rotation in 2 dimensions for example. Are there any built in functions in the graphics community that can help me with this?

Thanks for any help.
Oh and for what its worth I have an 8800gts, but if prelimiary results look good I might be able to get some grant money to throw several machines at the problem.

Task 1 should be easily speedupable in CUDA, although you may want to investigate the errors on the log and exp functions available on the graphics card (it only does floats as well, not doubles). The graphics card has a large amount of available memory, so you may be able to run the program over your entire space at once, or you might have to run over some subset and then copy data back to the host (off the graphics card).

Thanks. I’ll look into that. As long as the errors aren’t orders of magnitude it won’t affect my results.

This is a conjecture in need of some real data from someone in the know, but it looks like to me that this is just the kind of thing that the texture data/operations could do efficiently. Any NVIDIAns care to weigh in in the subject?

Yes, this sounds like an ideal application for 3D textures - these do give you interpolation in 3D dimensions.

Unfortunately CUDA doesn’t currently support 3D textures - we’re hoping to add support for this in a future release.

In the meantime, you could implement interpolation in the CUDA code, which would be slower than using texture but likely still much faster than CPU code.

I will be looking forward to that. Thank you for confirming what I was looking for.

Note that it’s possible to accelerate Matlab computations with CUDA. We will have some examples of this available in the future. For now, you can check out this presentation:


It is already out…