I’ve found the performance issue in CUDA 6.0 integer arithmetic. I’ve a few kernels calculating SHA family hashes (SHA-1, SHA-256 etc) and all of them are running 50-80% slower on Fermi and Kepler comparing to CUDA 5.5 release.
I’ve done the small example: http://www.crark.net/download/cuda55_vs_6.zip. It calcucates SHA-256 of very long string.
- compile test.cu to ptx, next to cubin using the batch file iclb-cubin-v6.bat
- open solution in VS 2012 and build the executable
- run test.exe and check the speed.
- if you don’t have both CUDA versions, there are already compiled files in PTX_CUBIN folder.
I noticed that CUDA 6.0 has quite the same rate as CUDA 5.0, but CUDA 5.5 is running significantly faster.
Please confirm my results and please give me the possible solution.