For image processing applications, there are a lot of multiplication and addition on char data type.
How to do this efficiently using CUDA supported vector components, like Built-in Vector Types ?
Any suggestions or related materials , exmaples are highly appreciated.
Thanks!
-Y