I’m experiencing issues with CUDA kernels that seem to have no reason and make no sense to me. I originally made this post over on the CUDA programming part of this forum, however people were unable to replicate the issue (using typical x86_x64 systems). Would anyone from Nvidia please try to replicate it on their ORIN NX?
System:
Jetson Orin NX (16GB ram), Jetpack 5.1.2, L4T 35.4.1, CUDA 11.4.315, code compiled as c++17
The original script used opencv compiled with cuda and opengl , however it’s not needed to replicate the issue.
code_feb_18_B.zip (2.6 MB)
This is a scaled down version (that does not use opencv). The output it gives on Orin NX is:
BLOCKS: 544 THREADS 256 RUNS TOTAL: 139264 SHOULD BE 139264
Diff at: 7378 of 80
Diff at: 7379 of 240
Diff at: 7382 of 48
Difference between buffers: 368
However when I run it on my x86 x86_x64 pc (after adjusting the architectures in makefile from Orin’s sm_87 to my pc’s sm_86) the output is:
BLOCKS: 544 THREADS 256 RUNS TOTAL: 139264 SHOULD BE 139264
Difference between buffers: 0
Thank you for the workaround. I’ve temporarily shifted development to an x86_x64 system, given that the cause of this is currently unknown. Therefore I’m not sure whether this is an isolated case only affecting byte shifts, or pops up again in other seemingly unrelated scenarios.
I’m looking forward to further updates from you guys.