Histogram Sample

I want to make a histogram with histogram sample (256 hist) of Cuda 9.

My images are stored in "unsigned char"s.

I converted the histogram sample to accept unsigned char instead of uint and it worked for GeForce 740M (Kepler cc 3.0) but not with P4000 (Pascal cc 6.1). In the latter I had to store the images in uint and ran the original histogram sample.

The histogram256kernel had a single instruction to access the images. Can I translate that instruction to permit to access to unsigned char (in the P4000)?

In the google searchs I found that there is other histogram algorithm that do not use atomic instructions and it is a lot faster. Can someone give a link for the sources of such algorithm?


You should be able to create such an algorithm that runs correctly on P4000. Use standard methods for error isolation and debugging.

I used the same code in both GPUs. I commented the data access instruction in histogram256kernel and I had no error (of memory access) on the P4000 GPU. This is a memory access problem

I only changed the “uint” to “unsigned char” in the call and Bytecount = numberofbytes*size(uint) (this remained has the same if the data was uint). This in the GeForce 740M