Firstly let me say that I’m having alot of success with image processing in CUDA. I have a framework which transparently marshalls between framegrabbers and device memory and OpenGL displays in real-time. Its all great and proving very powerful - so far there is little I havent been able to achieve! Great work NVidia.
But a question comes up here quite often:
Is there a really efficient way to access 24-bit RGB colour images in CUDA? At the moment, I am forced to convert from 24bit RGB to 32bit RGBX on the host. NVidia neatly sidestep the problem in all of their image processing demos by pre-loading images in 32bit RGBX format - and then ignore the 24bit problems and conversion overhead during computation.
32-bit RGBA can be optimised quite nicely since that 4byte alignment permits coalesced memory access. Unfortunately reading 3 bytes per pixel of RGB is slower since coalescing does not work well. In order to get around this I tried texture access but that is not good either. It would seem that texture references HAVE to be either 1,2 or 4 bytes per pixel. i.e. You cannot use cuCreateArray with NumChannels=3.
Also, whilst a 4-byte texture reference such as
texture<uchar4, 2, cudaReadModeNormalizedFloat> texrgba;
will work, a 3-byte texture reference such as
texture<uchar3, 2, cudaReadModeNormalizedFloat> texrgb;
will not. There is no tex2D function for uchar3 and the compiler barfs an error.
Currently, I am force to treat RGB images as single channel images (1 byte per pixel) and then do the RGB addressing in the kernel, which ends up with alot of misaligned memory access.
If anybody has got a really neat way to deal with RGB images then please share. Many thanks!
Oh and one final thing, in the Runtime API, it is possible to make a cudaChannelFormatDesc which describes a 24bit image. But you cannot write a kernel that uses a 3channel tex2D fetch. Whats going on there?