I am doing a 3D convolution using openGL, and get my result back in 32
textures (each texture has 4 components in z (r,g,b,a))
the 3rd dimension is hence 32*4=128, then I would like to take
this volume and do an argmax over the 3rd dim in cuda
I would like to compare them all in cuda and compute a 2D matrix
that would be the max value over all the textures within the z
what is the most efficient way to do that, given that I have a limited number of bufferObject in opengl (8 or 16 depending on the driver) ?
My idea was the following :
copy all the textures into cudaArrays
bind them using cudatextures.
fetch the textures inside the kernel
But the problem is you can’t have a dynamic number of textures.
I need to hardcode each texture reference as global like this :
texture<float4, 2, cudaReadModeElementType> tex;
so I’m stuck if I have 64 textures instead of 32…
don’t know if that’s clear.