Couldn’t you just declare a large array of texture references and then only use the ones you need? Presumably there is some maximum depth to your volume?
The maximum number of texture samplers is 16 on DirectX 10 class hardware, CUDA doesn’t get around this. You can dynamically re-bind texture references (samplers) to arrays (i.e. texture images) using cudaBindTexture().
There are several other ways you could do this:
One method would be to create a single OpenGL buffer object, and then read all the textures (the whole volume) into it using glGetTexImage. Then you could map this buffer object in CUDA and calculate the maximums for each of the 128 values in parallel, doing the correct addressing in the CUDA kernel.
Another method would be to not use CUDA at all, and just render all of the slices to the framebuffer using a max blend function.
No, you can’t map framebuffer objects directly (in CUDA or OpenGL). The names are confusing, but FBOs are not buffer objects in the same way vertex buffer objects and pixel buffer objects are.
The only way to do this is to read from the FBO to a PBO using glReadPixels, and then map the PBO in CUDA.