CUDA-OpenGL interaction and future drivers

Hey guys, great job on CUDA. I might just have to excuse you for making me write an article in less than 24 hours about it to meet the deadline because of that ;)

Anyhow, I was wondering what your feeling is right now on the usability of CUDA for future video games right now. The first question there obviously is whether future public drivers for normal consumers will include CUDA functionality. The second is how efficient you think OpenGL-CUDA interaction can be, and whether you plan to expose a few extra features or not going forward.

My worry when using CUDA is that the docs make me think it would reserve the entire GPU to it, so that I couldn’t have an OpenGL context running at the same time. What that means to me is a full pipeline flush, and the associated slight loss of time. Obviously, I’m not a fan of pipeline flushes!

Secondly, I was wondering if there was any plan whatsoever to support anisotropic filtering in CUDA. I fully understand that even supporting filtering at all in CUDA must be confusing some people, so I’m not really expecting miracles here. The way this could be exposed is fairly straightforward though, depending on how the HW implementation works. Ideally, you’d be able to launch a texturing command using the derivative proposed by the first thread in a group of four, because AFAIK, per-thread derivatives would reduce performance by 75%.

Thanks in advance!

Well, it seems to be working okay at the moment. I’m currently running Fedora Core 6 with the Beryl 3D desktop, and CUDA appears to run without a hitch at the moment. I even fired up some glxgears and ran some of the CUDA sample programs without issue. I haven’t started trying to integrate CUDA into my own code just yet, as I need long FFT support for it to be of any use (minimum 16384 length, preferably millions of elements for 1-D FFT’s). Right now the FFT support is limited to 512 elements, it seems.

CUDA is a client of the GPU in the same way as the OpenGL and Direct3D drivers are - it shares the GPU via time slicing. It is possible to run multiple graphics and CUDA applications at the same time, although currently CUDA only switches at the boundaries between kernel executions, I believe.

The cost of context switching between CUDA and the graphics API is roughly the same as switching graphics contexts. This isn’t something you’d want to do more than a few times a frame, but is certainly fast enough to make it practical for use in games.

I don’t think we have any plans to support anisotropic texture filtering or mipmapping. Note that in CUDA mode the hardware doesn’t have access to neighbouring fragments to compute texture coordinate derivatives, so you would have to supply these directly.

What is your application for aniso?

Hi there,

There is one thing I found boring when working on a project including cuda and opengl in parallel.
It was the way opengl (at least the nvidia drivers) was handling the memory. It looks like opengl allocates the memory in its own pool and never free it. For example, when a texture is deleted, it is like opengl flags this memory as free, so it (opengl) can use it later for its own stuff, but this memory is still allocated in the opengl pool. And so cuda considers it as occupied and is not able to use it.
The only way I found to free this memory was to delete the opengl context.

Am I wrong?

Otherwise no pb.

– pium