Why there is no cudaBindTexture3D? It would be nice to have this ...

I’m working with voxel volumes and it would be nice to have also a cudaBindTexture3D.

There exist a cudaBindTexture2D call, but the cudaBindTexture3D call is missing.

An alternative would be CUDA arrays but you can’t write them :confused:.

Would it be possible to add cudaBindTexture3D to CUDA or make CUDA arrays also writable?

I already considered to write a routine myself to write to CUDA arrays, but for that I need
to know the indexing scheme of CUDA arrays and it might be much work.

If you are asking why do I need such stuff … I have an segementation algorithm for voxel
volumes that access groups elements (19 voxel 3D neighborhood) without a predictable access pattern,
so shared memory is not a good solution for this, but caching would improve the performance.

The new Fermi architecture would be very suited for this algorithm, but we have to wait some time … and
it would be nice to have a solution for the current architecture.

At the moment I’m using normal memory accesses which isn’t cached at all.

A possibility would be the usage of a 1D texture and calculating the indexes myself, but in this case I couldn’t use the nice features of automatic boundary checking (clamping mode) of the texture unit, but this would be probably better than no caching at all.

I have to pay attention that every memory line starts at 64-byte address. cudaMalloc3D would do that.

Current hardware can’t bind linear memory as a 3D texture, hence no cudaBindTexture3D. As you mention the only real solution possible today is to bind the memory as a 1D texture and do the addressing and interpolation yourself.

I have implemented the 1D tex access and achieved a performance gain of about 25 % without any other modification.


I have just started with CUDA, and i am also trying to use 3D textures. I try to use them for volume rendering.

There is this example from NVIDIA, that uses 3D textures (volumeRender uses tex3D). What they do is allocate 3D memory and then bind it with cudaBindTextureToArray(…).

What would be interesting to know is that why does that work, and do they get proper interpolation in 3D.

Yes, to be clear, CUDA does have full support for 3D textures with interpolation, it’s just that they must be stored in memory in the special CUDA 3D array format.

We don’t support 3D textures bound to regular linear memory.

Thank you for quick reply!

As I am opposing ‘hey lets try if it works’ approach, i am here again asking, please forgive me. I have not found answer from programming manual (or mayby you can direct me to to correct manual and/or read again those sections and/or existing thread ?) how the textures can be modified.

What i am trying to do, is do a 3D volume and render that (so 3D arrays are out of question, since those cannot be modified, right). Then modify the volume and re-render. To avoid transferring lots of data from host to device, i had in mind to handle the modification on the device.

How about following solution:

  1. Allocate linear memory
    1.1) Copy memory from host to allocated memory
  2. Bind that memory to 2D texture
  3. Do tex2D(…)
  4. Unbind
  5. Modify linear memory
  6. Goto 2.

Is unbinding needed? Texture cache needs to flushed for correct result? Or is the texture cache flushed when different thread starts to execute?

Another option using 3D textures and 3D-array would be to copy from linear memory to 3D-Array (device to device) but that would require double the space – but it should be possible?
So it would be something like this:
0. Allocate and initialise etc.

  1. Bind 3D-Array to 3D texture and render
  2. Modify linear memory
  3. Update 3D-Array with memcpy device to device from linear memory
  4. Unbind (?!)
  5. Goto 1.


Why do you want to use tex2D when your actual data is 3D? I would recommand the same solution as i used, tex1Dfetch on linear memory and caltulating the index yourself.

The second solution will also work and you have the advantage of a better cache locality of your data.

It depends on your algorithm which of these solutions will be faster. This means how often you need to modify your data and how much time do you save with 3D textures. It will be only faster if time savings will compensate the time for coping between linear memory and the CUDA array.

Probably there will be only one way - you have to implent both solutions and try them.

My bad for inaccurate writing. I want to do real 3D - interpolation on point (x,y,z) – that is value on point x,y,z depend on six surrounding voxels (corners).

What i ment by writing 2Dtex is that i would save data in XY-plane wise, and then when calculating value at point x,y,z i would do (get level INT_FLOOR(z) texture at point x,y):

v1 = tex2D( texture_reference[ INT_FLOOR(z)], x, y ) and

v2 = tex2D( texture_reference[ INT_FLOOR(z) + 1], x, y ).

then final value would be:

v = (z - INT_FLOOR(z)) * v1 + (INT_CEIL(z) - z)*v2.

I am not sure if i understand correct your proposal of using 1D textures. What came to my mind is that you suggest doing something like index = INT_FLOOR(x) + INT_FLOOR(y)*Xsize + INT_FLOOR(z)XsizeYsize and then take corresponding value beteen index and index + 1.

Then to get real 3D-interpolation one needs still to do the same for y_index + 1, and z_index + 1 (total of 4x tex1D calls) and then do ‘manual’ interpolation three times. This might certainly still be faster way.

I am afraid you are correct that only way to figure out what is fastest is to do test program for all the cases…

Still, i would like to know if i need to unbind the texture before modifications. ‘it seems to work’ - solution is not what i would like to do…

Cheers and thanks!

At the moment I would remark only one thing … the number of textures is limited to 128 … so you can only have 128 XY-Planes … if it’s enough it might be a good solution.

So I have to go … to be in time.