Hello, I implement radix sort on OpenGL Compute Shader
I have AMD GPU and algorithm perfectly works.
But same tests executable on NVidia GPU have failed with some array size.
I tried debug, but… I don’t why tests failed.
I tried different array sizes. With big array tests failed all time.
I tried to simplify task.
In this branch sorted increasing sequence of unsigned int… key only sort.
On AMD it’s work perfectly
OpenGL 4.5.13399 Compatibility Profile Context 16.201.1151.1007
ATI Technologies Inc.
AMD Radeon HD 6700M Series
count 67108864 elapsed 47376119 ticks 4.73761190 sec speed 14165124 per sec - PASSED
count 33554432 elapsed 29163211 ticks 2.91632110 sec speed 11505739 per sec - PASSED
count 16777216 elapsed 14703350 ticks 1.47033500 sec speed 11410471 per sec - PASSED
count 8388608 elapsed 7364902 ticks 0.73649020 sec speed 11389979 per sec - PASSED
count 4194304 elapsed 3611232 ticks 0.36112320 sec speed 11614606 per sec - PASSED
count 2097152 elapsed 1817044 ticks 0.18170440 sec speed 11541558 per sec - PASSED
count 1048576 elapsed 950502 ticks 0.09505020 sec speed 11031812 per sec - PASSED
count 524288 elapsed 626875 ticks 0.06268750 sec speed 8363517 per sec - PASSED
count 262144 elapsed 267127 ticks 0.02671270 sec speed 9813459 per sec - PASSED
count 131072 elapsed 146703 ticks 0.01467030 sec speed 8934513 per sec - PASSED
count 65536 elapsed 90163 ticks 0.00901630 sec speed 7268613 per sec - PASSED
count 32768 elapsed 67645 ticks 0.00676450 sec speed 4844112 per sec - PASSED
count 16384 elapsed 53292 ticks 0.00532920 sec speed 3074382 per sec - PASSED
count 8192 elapsed 46483 ticks 0.00464830 sec speed 1762364 per sec - PASSED
count 4096 elapsed 41232 ticks 0.00412320 sec speed 993403 per sec - PASSED
count 2048 elapsed 41612 ticks 0.00416120 sec speed 492165 per sec - PASSED
count 1024 elapsed 37630 ticks 0.00376300 sec speed 272123 per sec - PASSED
COMPLETE
but on NVidia
OpenGL 4.5.0 NVIDIA 358.87
NVIDIA Corporation
GeForce GT 720/PCIe/SSE2/3DNOW!
count 67108864 elapsed 51996058 ticks 5.19960580 sec speed 12906529 per sec - FAILED
count 33554432 elapsed 46898167 ticks 4.68981670 sec speed 7154742 per sec - FAILED
count 16777216 elapsed 23762439 ticks 2.37624390 sec speed 7060393 per sec - FAILED
count 8388608 elapsed 11960835 ticks 1.19608350 sec speed 7013396 per sec - FAILED
count 4194304 elapsed 6035597 ticks 0.60355970 sec speed 6949277 per sec - FAILED
count 2097152 elapsed 3058258 ticks 0.30582580 sec speed 6857341 per sec - FAILED
count 1048576 elapsed 1597550 ticks 0.15975500 sec speed 6563650 per sec - FAILED
count 524288 elapsed 874843 ticks 0.08748430 sec speed 5992938 per sec - FAILED
count 262144 elapsed 523463 ticks 0.05234630 sec speed 5007880 per sec - PASSED
count 131072 elapsed 322490 ticks 0.03224900 sec speed 4064374 per sec - PASSED
count 65536 elapsed 224269 ticks 0.02242690 sec speed 2922205 per sec - PASSED
count 32768 elapsed 178335 ticks 0.01783350 sec speed 1837440 per sec - PASSED
count 16384 elapsed 156417 ticks 0.01564170 sec speed 1047456 per sec - PASSED
count 8192 elapsed 145962 ticks 0.01459620 sec speed 561241 per sec - PASSED
count 4096 elapsed 139295 ticks 0.01392950 sec speed 294052 per sec - PASSED
count 2048 elapsed 136576 ticks 0.01365760 sec speed 149953 per sec - PASSED
count 1024 elapsed 135250 ticks 0.01352500 sec speed 75711 per sec - PASSED
COMPLETE
Can anybody help to understand the causes of such behavior?
I tried output intermediate result, and compare difference with AMD. But fails can be not in first step of radix sort. And I don’t understand… What else can I try?