I am facing a tricky issue with a CUDA program of mine, which is difficult to localize and provide an MWE for, since almost anything I remove from the device-side code makes the issue go away.
print_floating_point()between the two
printf()instructions, the local variable
0.0, for no apparent reason.
- When applying compute-sanitizer to the program, I get notified of an “Invalid local read of size 8 bytes” at line 667 (an invocation of
I’m using CUDA 11.6.55 and compiling for compute capability 6.1 (GTX 1050 Ti). I suspect this may have something to do with register spilling, but can’t say for sure.
The program is here… I know, I know, it’s big program, 915 lines, but - I cut it as far as I could without having the effect disappear. I just can’t seem to localize it - maybe it has a global aspect? Related to spilled registers or something?
- The motivation is a full-fledged
printf()-family implementation for CUDA code. i.e. including the missing specifiers in CUDA’s built-in printf, support for printing binaries / bitmasks, and most importantly -
sprintf()which is sorely missed. It would be a port of this library.
- For the purposes of this post, I am not concerned with the final output not being correct. This program doesn’t have the entire printf’ing code anyway. Once I get by the weird, unexplainable behavior I’ll make sure this, and the other ~500 testcases, pass.
- Due disclosure: I also asked this on StackOverflow…
verbose ptxas output:
$ $ ptxas --verbose --gpu-name sm_61 test/test_suite_device.ptx 2>&1 | cu++filt ptxas info : 62 bytes gmem ptxas info : Compiling entry function 'snprintf_kernel(char *, unsigned long)' for 'sm_61' ptxas info : Function properties for snprintf_kernel(char *, unsigned long) 8 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 70 registers, 336 bytes cmem, 412 bytes cmem ptxas info : Function properties for snprintf_(char *, unsigned int, const char *, ...) 232 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads
sanitizer output (snipped):
COMPUTE-SANITIZER Invalid __local__ read of size 8 bytes at 0x11d8 in /home/eyalroz/src/mine/printf/test/cuda/test_suite_device.cu:507:_ZN51_INTERNAL_507c18dc_20_test_suite_device_cu_c3781aca20print_decimal_numberEP8gadget_tdjjjPcj by thread (0,0,0) in block (0,0,0) Address 0xfffd20 is out of bounds Device Frame:/home/eyalroz/src/mine/printf/test/cuda/test_suite_device.cu:658:_ZN51_INTERNAL_507c18dc_20_test_suite_device_cu_c3781aca8print_fpEP8gadget_tdjjjb [0x11c8] Device Frame:/home/eyalroz/src/mine/printf/test/cuda/test_suite_device.cu:797:_ZN51_INTERNAL_507c18dc_20_test_suite_device_cu_c3781aca10_vsnprintfEP8gadget_tPKcP13__va_list_tag [0x10f8] Device Frame:/home/eyalroz/src/mine/printf/test/cuda/test_suite_device.cu:869:snprintf_(char *, unsigned int, const char *, ...) [0x170] Device Frame:/home/eyalroz/src/mine/printf/test/cuda/test_suite_device.cu:879:snprintf_kernel(char *, unsigned long) [0x78]