Is texture<float3,3> possible? cudaErrorInvalidChannelDescriptor

Hi guys,

My kernel function needs to read 3D vector field: 3D image of vectors, each vector being float3. What I have so far is:

cudaChannelFormatDesc channelDesc3={32,32,32,0,{cudaChannelFormatKindFloat}};

cudaMalloc3DArray( &arrGv, &channelDesc3, size3 ); //size3={256,256,8}

cudaError_t err = cudaGetLastError(); //cudaErrorInvalidChannelDescriptor

cudaCreateChannelDesc() does not work (resulting channel descriptor is filled with zeros).

Is there a way to do what I want, or I need to use 3 float images (or maybe float4 image)?

I am thinking about using float4 as float3 for vector components and fourth float as their precomputed magnitude. Is there some way to skip copying to certain element of an array? If there is, I could avoid allocation of another float4 buffer in host memory just so I can copy contents of gradient vector (float3) and gradient magnitude (float) images into it, and then copy the resulting buffer to device memory.



Section 3.2.4 of the programming guide covers the do’s and dont’s of textures, but the short answer is no 3 component vector types are supported.


it should work with float4.

To save memory and depending on which precision you need, you can encode your (normalized?) float3 into an int (for example see the function rgbaFloatToInt in many of the examples).

Or you need 3 textures of float and 3 times more memory access.

– pium

Thank you for suggestions. I have chosen the packed float4 path.

Reviving this really old thread (see also and opencv - How to access each channel of a pixel using cuda tex2D - Stack Overflow) cause I am currently facing exactly this problem (accessing a 3-channel interleaved image via texture objects):

A workaround seems to be possible, by treating the image as a one-channel image with three times the width and binding a texture (object) to this one-channel image.

The texture access would have to be slightly modified, instead of giving tex2D the x-coordinate ‘x’ one would give it the x-coordinate ‘3.0f * x + ch’ where ch is the desired channel (0, 1 or 2). The cost (one additional multiplication and one addition per pixel access) seems to be reasonable small for memory-bandwidth bound kernels.

Of course this should work also for other number of channels than 3. Currently implementing this and checking if it works.