cudaCreateChannelDesc() does not work (resulting channel descriptor is filled with zeros).
Is there a way to do what I want, or I need to use 3 float images (or maybe float4 image)?
I am thinking about using float4 as float3 for vector components and fourth float as their precomputed magnitude. Is there some way to skip copying to certain element of an array? If there is, I could avoid allocation of another float4 buffer in host memory just so I can copy contents of gradient vector (float3) and gradient magnitude (float) images into it, and then copy the resulting buffer to device memory.
To save memory and depending on which precision you need, you can encode your (normalized?) float3 into an int (for example see the function rgbaFloatToInt in many of the examples).
Or you need 3 textures of float and 3 times more memory access.
A workaround seems to be possible, by treating the image as a one-channel image with three times the width and binding a texture (object) to this one-channel image.
The texture access would have to be slightly modified, instead of giving tex2D the x-coordinate ‘x’ one would give it the x-coordinate ‘3.0f * x + ch’ where ch is the desired channel (0, 1 or 2). The cost (one additional multiplication and one addition per pixel access) seems to be reasonable small for memory-bandwidth bound kernels.
Of course this should work also for other number of channels than 3. Currently implementing this and checking if it works.