How to vectorize code for char data type?

For image processing applications, there are a lot of multiplication and addition on char data type.

How to do this efficiently using CUDA supported vector components, like Built-in Vector Types ?

Any suggestions or related materials , exmaples are highly appreciated.



Current generation of GPUs use scalar processors. That’s why there are no built-in vector operators, though you can certainly overload them.


You could pack chars into ints though and get more bang for the buck, you could add 3 chars at a time if the calculations might overflow and 4 if you were guaranteed they wouldn’t. Multiplying would be trickier, you could only multiply 2 at a time and 32 bit integer multiplication is slow on current generation cards, so it might not be a gain, but maybe that will change.