I’m newbie with CUDA and hit some unexpected problem. My code does lots of iterations, bit shifts, xor operations, … and all cumulative. All happens with uint64_t variables, which are all initialized prior start. Only one thread is launched (for testing), hence no race conditions. I’m using constant memory. Hardware is RTX 2080 SUPER.
Now … same departing point, different results, but only after the first couple of loops.
Never (!) seem such a thing.
How is that possible?
763126208 3811226253 <<<
763126208 3778720397 <<<
How is that possible?
In order of decreasing likelihood, I’d say:
- bug(s) in code; may be design issues or coding issues.
- compiler code generation issue
- hardware issue
The likelihood of item (1) is typically in the 95+% range. Depending on what kind of GPU you have, items (2) and (3) may apply in reverse. Beware in particular of heavily (vendor-)overclocked GPUs; these may have been insufficiently qualified for proper operation when using compute apps .
When you run the app under the control of
cuda-memcheck, does it report any issues for the failing runs?
Thanks for the answer.
cuda-memcheck helped quite a lot! I managed to to reduce the problem to the
printf function itself, which showed me the dump above. If I comment it out, no memory errors anymore! However I’d like to understand what happens here.
Basically, I’m using inside the kernel …
printf("%lu \n", data);
… which should be fully legal code.
On the host, this doesn’t produce any issues, why inside the kernel?
========= Address 0x00fffcf0 is out of bounds
commenting out the printf may allow the compiler to dispense with other code in your kernel as well, which means your assumptions about what is happening and where the problem lies, may be incorrect. commenting out code can be quite a confusing strategy for either performance or debug when using an aggressively optimizing compiler. Use the method described here:
with cuda-memcheck to get teh actual line of source code that is generating the out-of-bounds access.
Indeed, after reading your SO post, I discovered 2 more bugs, all related to bad indexing with subsequent illegal global memory reads.