I want to make a histogram with histogram sample (256 hist) of Cuda 9.
My images are stored in "unsigned char"s.
I converted the histogram sample to accept unsigned char instead of uint and it worked for GeForce 740M (Kepler cc 3.0) but not with P4000 (Pascal cc 6.1). In the latter I had to store the images in uint and ran the original histogram sample.
The histogram256kernel had a single instruction to access the images. Can I translate that instruction to permit to access to unsigned char (in the P4000)?
In the google searchs I found that there is other histogram algorithm that do not use atomic instructions and it is a lot faster. Can someone give a link for the sources of such algorithm?