argmax with cuda using openGl texture


I am doing a 3D convolution using openGL, and get my result back in 32

textures (each texture has 4 components in z (r,g,b,a))

the 3rd dimension is hence 32*4=128, then I would like to take

this volume and do an argmax over the 3rd dim in cuda

I would like to compare them all in cuda and compute a 2D matrix

that would be the max value over all the textures within the z


what is the most efficient way to do that, given that I have a limited number of bufferObject in opengl (8 or 16 depending on the driver) ?

My idea was the following :

  • copy all the textures into cudaArrays

  • bind them using cudatextures.

  • fetch the textures inside the kernel

But the problem is you can’t have a dynamic number of textures.

I need to hardcode each texture reference as global like this :

texture<float4, 2, cudaReadModeElementType> tex;

so I’m stuck if I have 64 textures instead of 32…

don’t know if that’s clear.


anybody has an idea on how create a dynamic array of textures ?

Couldn’t you just declare a large array of texture references and then only use the ones you need? Presumably there is some maximum depth to your volume?

The maximum number of texture samplers is 16 on DirectX 10 class hardware, CUDA doesn’t get around this. You can dynamically re-bind texture references (samplers) to arrays (i.e. texture images) using cudaBindTexture().

There are several other ways you could do this:

One method would be to create a single OpenGL buffer object, and then read all the textures (the whole volume) into it using glGetTexImage. Then you could map this buffer object in CUDA and calculate the maximums for each of the 128 values in parallel, doing the correct addressing in the CUDA kernel.

Another method would be to not use CUDA at all, and just render all of the slices to the framebuffer using a max blend function.

Does cuda allow to use a frambuffer object ?

is there a way to do a mapping with cuda afterward ?

Something like :

cudaGLMapBufferObject( (void**)&in_data, myFrameBuffer)

and then use in_data as any linear memory ?

No, you can’t map framebuffer objects directly (in CUDA or OpenGL). The names are confusing, but FBOs are not buffer objects in the same way vertex buffer objects and pixel buffer objects are.

The only way to do this is to read from the FBO to a PBO using glReadPixels, and then map the PBO in CUDA.

how can I do that ?