I have a cuda program that is quite time-consuming in a kernel function, but it has always been running normally, even on an RTX 1050 graphics card
But on Win10 RTX A6000 machines, it seems like they are always killed by Windows TDR timeout. Although it can run smoothly by modifying the registry, I think this is not reasonable
And on the RTX 1050 machine, when TDR is started, a kernel function can run for even 300 seconds, but it still cannot trigger the TDR mechanism, which is inconsistent with the data I have found. I have also tried other graphics card types, but they also cannot trigger it
Is there any potential problem with this
GTX 1050 and RTX A6000 support compute instruction level preemption. CUDA enables compute instruction level preemption and should not hit TDR. Graphics drivers do not enable instruction level preemption so it is possible to TDR.
Does the application TDR both in NVVP and outside of NVVP? I ask as this is possibly an issue with NVVP conflicting with default settings to avoid context switch. Context switches can impact reliability of performance counters so tools increase the CUDA context timeslice and should likewise be increase the TDR registry value.
NVVP calculates duration = timestamp_end - timestamp_start
The tool does not try to subtract time between start and end timestamp the context is not active.
thanks fro your reply
so CUDA enables compute instruction level preemption and should not hit TDR,But this does not match the actual performance of my program
My cuda program often fails to run on the TRX A6000 machine, but when I disable TDR, it is OK
Because my development machine is only 1050, I use NVVP just to count the time. Not running the program through NVVP will also take a long time, but it will not trigger TDR.
If the cuda program should not trigger TDR both on the RTXA6000 and GTX1050, are there any other problems?
Looking forward to your help