Rendering directly from 2D CUDA texture?

lobsterBA · October 15, 2008, 7:41pm

I’m trying to find a fast way, i.e. not reading global memory back to the host side, to render. In the SDK, VBOs are used but do not seem fitting for my case. I’m updating structured data via a kernel (linear memory) but have not found a nice solution whereby I can pack this into a texture on the card and then directly render. Reading back to host side and packing a texture or doing drawPixels is just too slow.

Any solutions?

Simon_Green · October 16, 2008, 8:31am

This reminds me that there is an NVIDIA OpenGL extension that lets you texture directly from buffer object memory:
[url=“http://developer.download.nvidia.com/opengl/specs/GL_EXT_texture_buffer_object.txt”]http://developer.download.nvidia.com/openg...ffer_object.txt[/url]

I haven’t tried this, but in theory it would let you write to linear memory from CUDA and then read from it directly in OpenGL (you’d have to do the 2D to 1D mapping in the shader).

I may try writing a sample for this.

RLight · October 16, 2008, 2:17pm

I think you can use GetBackBuffer() and map buffer directly to CUDA address space! After that, you can render anything to the buffer! I tested with Direct3D and have a good result (over 1000 fps with gradient fill)

But, dont use OpenGL for this! cudaMapBufferObject() is really SLOW !
Using Direct3D is much more faster!

lobsterBA · October 16, 2008, 4:42pm

Simon,

Thanks for the input. I thought this would be a subject greatly covered, but I realize the purposes of CUDA aren’t quite in the same light. I will look more at the OpenGL extension, and of course a sample would be amazing. I think it would be useful to non-trivial subset of the community.

RLight,

My application is stuck with OpenGL. Just how slow is cudaMapBufferObject() ?

RLight · October 17, 2008, 8:07am

You can test the speed very easy…! With boxFilter (or other samples), disable V-Sync (include “GL\wlgew.h” and call wglSwapIntervalExt(0)), then calculate the speed of map and unmap function only…! The result is really bad…!

In my system, the speed reduces from 1280 → 115 fps without calling for kernel… (only map and unmap function)! But, with Direct3D, i still have 990 fps!

I think you can develop your program with OpenGL first, if the speed is not good enough, you can change to Direct3D very easy, because 2 buffers are not very different!

Simon_Green · October 17, 2008, 10:57am

BTW, we are aware of the poor cudaMapBufferObject performance and this should be fixed in the next release.

lobsterBA · October 22, 2008, 4:02pm

Any more input or advice?

Thanks

paulrichmond · November 21, 2008, 9:20pm

I read this post earlier today looking for some help on using texture buffer objects. I finally maged to get my head around the extension and have knocked up a working demo. Just stick it in the SDK projects folder.

[url=“http://www.dcs.shef.ac.uk/~paul/textureBufferObject.html”]http://www.dcs.shef.ac.uk/~paul/textureBufferObject.html[/url]

This example extends the simpleGL CUDA SDK example to demonstrate how to use Texture Buffer Objects to read (CUDA output) texture data directly from a Buffer Object. This avoids copying data from a Pixel Buffer Object to texture using a call to glTexSubImage2D. The TBO in this example is mapped to CUDA in exactly the same way as the VBO in simpleGL however glTexBufferEXT is used to map the TBO data to a texture. Two Vertex Buffer Objects (VBOs) are used to store vertex data for some vertex point data and some vertex attributes respectively. A vertex shader then fetches the CUDA position data using the per vertex attribute data as an index in the texture data.

Let me know if you have any comments. It would probably be useful to update with the post processing sample to use TBO for benchmarking the performance. Maybe after the weekend ;-)

Paul

Simon_Green · November 21, 2008, 9:38pm

FYI, I updated the “bicubicTexture” sample in the SDK with an option to use texture buffer objects for displaying the results from CUDA directly (#define USE_BUFFER_TEX).

It’s a bit faster, but not much (unfortunately the map/unmap still take most of the time).

This should be included in the 2.1 SDK, which will be released soon.

paulrichmond · November 21, 2008, 11:36pm

Cheers Simon. You mentioned that you are aware of the poor cudaMapBufferObject performance. Any idea when a release will be out that speeds this up a bit.

Paul

E.D_Riedijk · November 22, 2008, 3:09pm

I would try 2.1 beta, since that is the next release compared to the time Simon said it would be fixed in the next release

Topic		Replies	Views
bind texture reference to raw linear data or array CUDA Programming and Performance	4	10514	July 4, 2007
Pass openGL data to CUDA. Question about speed. CUDA Programming and Performance	4	1864	August 22, 2016
display a buffer openGL/cuda question CUDA Programming and Performance	11	8132	May 13, 2008
Direct access to frame buffer? CUDA Programming and Performance	7	16592	June 20, 2023
How to send the CUDA results to OpenGL texture? CUDA Programming and Performance	2	10227	August 17, 2007
device->host->device copy vs cudaGLMapBufferObject 6vs9ms, shouldn't mapping be way faster CUDA Programming and Performance	0	4813	July 12, 2007
OpenGL & CUDA CUDA Programming and Performance	12	9837	January 16, 2009
CUDA and OpenGL data transfer CUDA Programming and Performance	9	21278	October 6, 2007
rendering from textures CUDA Programming and Performance	3	1336	August 20, 2009
PBO and CUDA Texture CUDA Programming and Performance	4	11090	January 13, 2008

Rendering directly from 2D CUDA texture?

Related topics