CUDA and 24bgr images =(

Hi everyone! =) (sorry for english )
I got a question:
I want to use the video card to speed up the code for image processing.
But the problem is that: both CUDA and AMDStream minimum can handle only int type (32 bits), but i have the image 24bgr =(
How can I map(convert) 24bgt image quickly to proccess them on video card? or how to handle?

Now i use next sample code:

int* gpu_data …
byte* image = …
for (int i = 0; i < imageSize; i++)
gpu_data[i] = image[i];

and then i map gpu_data on int3 array, that each position in the matrix was the color-point? … but one thing is already converting spend a lot of time =(((
What can i do?

The CUDA optimization section of the Supercomputing 2007 tutorial ( has an example on how to read 3-element vectors efficiently. You can follow the same methodology, but change from int3 to uchar3. Also, if you have a GT200 card, you can evaluate how much perf you lose when reading uchar3s - it won’t be as bad as previous generations, for details see Optimization section in the 08 tutorial ( Lastly, you could consider padding (so rgba8-like type). That way all architectures would give you high memory performance.


Load in a stripe of data, likely 3*32 words… three words per thread. Then have each thread write the appropriate subchannel to shared memory, converting/expanding it to a float if you like.

The conversion speed will be absolutely negligable. The important part is to load the data in coalesced chunks with no wasted bandwidth.

Some similar code: