CUDA debug giving unexpected results on 7.5 but expected ones on 6.1

Hi everyone,

I have a relatively complex application which runs entirely on CUDA. Today, in the middle of the development of a new feature, I used the -g and -G modes to debug the application on my RTX 2060 (sm and cc=7.5). When I ran the code on this mode though, I received an unexpected output, which at first glance seems to be caused by a slight disordering in some of the threads. I did not dive into this more deeply as that would be very time consuming.

However, I noticed that running the code with no debug mode, or even with both debug modes but compiled on 6.1 mode would give me the expected results.

Did any of you have any problems with the debug engine on Turing architectures when setting the compute and sm to 7.5?

Best regards and thanks so much for your time!

Without code to look at it’s impossible to know. But a guess is intra-warp thread synchronization, a.k.a. “implicit warp-synchronous programming”. There’s a discussion here. Section “Implicit Warp-Synchronous Programming is Unsafe” might be relevant.

Yeah I’m aware of the intra-warp thread synchronization. Point is, with no debug enabled, it works fine as I adapted the code to be synchronous-aware. However, for some reason, when I enable the debug mode with architecture 7.5 it does not give the expected output. I know that without a code it is impossible to know what is exactly happenning, but I don’t know how to make a minimal test unit to prove this problem. I was just wondering if anyone else experienced something similar to this before debugging my code to find the problem, just in case it is a cuda bug.

Thanks in advance for taking time to answer dlevi!

I’ve experienced cases where debug mode computes answers differently than release mode. Usually, it’s the case that there’s an issue, but it’s not being triggered in release mode.

Have you tested your code with cuda-memcheck?

If yes, did you test with all tools?

  1. cuda-memcheck --tool <racecheck/initcheck/synccheck/memcheck>
  2. cuda-memcheck --leak-check full


Sorry for taking so long to answer your post (a month!), I’ve been busy with other projects and could not come back. I did test it with cuda-memcheck. I use Visual Studio to test my application and debug it, and as so I don’t know if I can use racecheck, initcheck or synccheck, or if those are used when specifying “enable cuda memory checker”.

I will come back if I get further results, thank you so much for your time :)