Reading OpenGL texture data from CUDA

I’m building a CUDA rendered UI. One feature of the UI library is going to be rendering of an arbitrary number of images. All such images are managed by the UI code as OpenGL textures.

Since the number of images are not known at compile time, I have to use CUDA texture objects rather than texture objects. My question is, what is the best way of mapping bound OpenGL textures to CUDA texture objects for reading? I can’t find any good examples of this. Preferably I would like to read the texture data without doing any copying.

This presentation walks through all the steps.

It shows how to draw from CUDA into a buffer that is mapped to/from an OpenGL texture.

It’s a bit dated, so using some deprecated APIs, but the concepts are all there.

I’ve inserted some sample code below that shows some of the steps:

/** GL handle on texture**/
GLuint m_texId;
/** Interop resource handle**/
cudaGraphicsResource *m_cudaGraphicsResource;
/** CUDA array that the texture is mapped to **/
cudaArray            *m_cudaArray;
/** reference to exture to read data through*/
cudaTextureObject_t m_texture;
/** reference to surface to write data to*/
cudaSurfaceObject_t m_surface;
glGenTextures(1, &m_texId);
glBindTexture(GL_TEXTURE_2D, m_texId);
// register (texture) to CUDA Resource	
m_texId, GL_TEXTURE_2D, cudaGraphicsRegisterFlagsSurfaceLoadStore);
// MAP
cudaGraphicsMapResources(1, &m_cudaGraphicsResource, m_stream.get());
cudaGraphicsSubResourceGetMappedArray(&m_cudaArray, m_cudaGraphicsResource, 0, 0);

// Do something with cuda texture object and surface object...

You can also have a look at the Mandelbrot sample that comes with the CUDA SDK. It uses the GPU to calculate a fractal image that is rendered to an OpenGL texture.

Thanks for the example code! It is not obvious to me how the m_texture fits into all this though. Just to be clear, I’m only trying to read data from an existing GL texture, not write to it.

As you can see from the comments, a texture can be read from, but you need a surface to write to a GL texture. So if you only need read access, ignore code dealing with m_surface.

So the m_texture will be used to fetch elements from the texture.

Some additional stubb codes:

// Creating the texture object
/** Descriptor */			
cudaResourceDesc m_resDesc;

memset(&this->m_resDesc, 0, sizeof(m_resDesc));
m_resDesc.resType = cudaResourceTypeArray;
// Looks it's that cuda array thingy again
m_resDesc.res.array.array = m_cudaArray;
cudaTextureDesc texDesc;
memset(&texDesc, 0, sizeof(texDesc));

// These need to be set according to your needs:
texDesc.addressMode[0]   = cudaAddressModeClamp;
texDesc.addressMode[1]   = cudaAddressModeClamp;
texDesc.filterMode       = cudaFilterModePoint;
texDesc.readMode         = cudaReadModeElementType;
texDesc.normalizedCoords = false;
// Create texture object
cudaCreateTextureObject(&this->m_texture, &this->m_resDesc, &texDesc, NULL);

So after setting up the texture object like above, you should be able to use it (after the mapping stage!):

template<class T> // some data type....
__global__ void kernel(cudaTextureObject m_texture)
	int x = threadIdx.x + blockIdx.x*blocDim.x;
	int y = threadIdy.y + blockIdx.y*blocDim.y;
	T val = tex2D<T>(m_texture, x, y);
	printf("\n image value = %d", (int)val);

DISCLAIMER: These are all tiny code snippets from my un-released open source project, I make no guarantees whatsoever :)

Thanks Jimmy, got it to work now.

I don’t get why a cudaArray needs to be involved just to read a GL texture, but then again, I have since long given up on actually trying to form a reasonable mental model of how the CUDA/OpenGL interop types relate to eachother.

Posting the working code below, for posterity.

// static scope (host)
cudaGraphicsResource *m_cudaGraphicsResource;
cudaArray *m_cudaArray;
cudaTextureObject_t m_texture;

// run once (host)

cudaGraphicsMapResources(1, &m_cudaGraphicsResource, 0);
cudaGraphicsSubResourceGetMappedArray(&m_cudaArray, m_cudaGraphicsResource, 0, 0);

cudaResourceDesc resDesc;
memset(&resDesc, 0, sizeof(resDesc));
resDesc.resType = cudaResourceTypeArray;
resDesc.res.array.array = m_cudaArray;

cudaTextureDesc texDesc;
memset(&texDesc, 0, sizeof(texDesc));
texDesc.addressMode[0]   = cudaAddressModeClamp;
texDesc.addressMode[1]   = cudaAddressModeClamp;
texDesc.filterMode       = cudaFilterModePoint;
texDesc.readMode         = cudaReadModeElementType;
texDesc.normalizedCoords = false;

cudaCreateTextureObject(&m_texture, &resDesc, &texDesc, nullptr);

// kernel (device)
uint render_at(
	// .. some args omitted
	const cudaTextureObject_t texture,
) {
	// ... some code omitted ...
	const uchar4 c = tex2D<uchar4>(texture, x, y);
	return 0xFF000000u | (uint(c.x) << 16) | (uint(c.y) << 8) | uint(c.z);

It’s been a long time, but as I recall, the way a cudaArray is arranged in physical memory (optimized for 2D spatial locality) corresponds exactly to how OpenGL storage is arranged.

Alright, makes sense