Rendering directly from 2D CUDA texture?

I’m trying to find a fast way, i.e. not reading global memory back to the host side, to render. In the SDK, VBOs are used but do not seem fitting for my case. I’m updating structured data via a kernel (linear memory) but have not found a nice solution whereby I can pack this into a texture on the card and then directly render. Reading back to host side and packing a texture or doing drawPixels is just too slow.

Any solutions?

This reminds me that there is an NVIDIA OpenGL extension that lets you texture directly from buffer object memory:…ffer_object.txt

I haven’t tried this, but in theory it would let you write to linear memory from CUDA and then read from it directly in OpenGL (you’d have to do the 2D to 1D mapping in the shader).

I may try writing a sample for this.

I think you can use GetBackBuffer() and map buffer directly to CUDA address space! After that, you can render anything to the buffer! I tested with Direct3D and have a good result (over 1000 fps with gradient fill)

But, dont use OpenGL for this! cudaMapBufferObject() is really SLOW !
Using Direct3D is much more faster!


Thanks for the input. I thought this would be a subject greatly covered, but I realize the purposes of CUDA aren’t quite in the same light. I will look more at the OpenGL extension, and of course a sample would be amazing. I think it would be useful to non-trivial subset of the community.


My application is stuck with OpenGL. Just how slow is cudaMapBufferObject() ?

You can test the speed very easy…! With boxFilter (or other samples), disable V-Sync (include “GL\wlgew.h” and call wglSwapIntervalExt(0)), then calculate the speed of map and unmap function only…! The result is really bad…!

In my system, the speed reduces from 1280 -> 115 fps without calling for kernel… (only map and unmap function)! But, with Direct3D, i still have 990 fps!

I think you can develop your program with OpenGL first, if the speed is not good enough, you can change to Direct3D very easy, because 2 buffers are not very different!

BTW, we are aware of the poor cudaMapBufferObject performance and this should be fixed in the next release.

Any more input or advice?


I read this post earlier today looking for some help on using texture buffer objects. I finally maged to get my head around the extension and have knocked up a working demo. Just stick it in the SDK projects folder.

This example extends the simpleGL CUDA SDK example to demonstrate how to use Texture Buffer Objects to read (CUDA output) texture data directly from a Buffer Object. This avoids copying data from a Pixel Buffer Object to texture using a call to glTexSubImage2D. The TBO in this example is mapped to CUDA in exactly the same way as the VBO in simpleGL however glTexBufferEXT is used to map the TBO data to a texture. Two Vertex Buffer Objects (VBOs) are used to store vertex data for some vertex point data and some vertex attributes respectively. A vertex shader then fetches the CUDA position data using the per vertex attribute data as an index in the texture data.

Let me know if you have any comments. It would probably be useful to update with the post processing sample to use TBO for benchmarking the performance. Maybe after the weekend ;-)


FYI, I updated the “bicubicTexture” sample in the SDK with an option to use texture buffer objects for displaying the results from CUDA directly (#define USE_BUFFER_TEX).

It’s a bit faster, but not much (unfortunately the map/unmap still take most of the time).

This should be included in the 2.1 SDK, which will be released soon.

Cheers Simon. You mentioned that you are aware of the poor cudaMapBufferObject performance. Any idea when a release will be out that speeds this up a bit.


I would try 2.1 beta, since that is the next release compared to the time Simon said it would be fixed in the next release