I have a rather complex set of kernels (more details in my other posts) that executes a bunch of kernels hundreds/thousands of times. After every cuda API call that returns an error, I do checks to ensure it is cudaSuccess (in Debug).
The problem is that if I run the code stand-alone, in the CPU debugger or using CUDA debugging it works perfectly fine, and completes as expected. If I run it through the “Start Performance Analysis” in visual studio, it trips my error check almost immediately with a cudaErrorLaunchFailure. I should note the code it returns an error to is executed in all other modes and the code isn’t changed any way in this case vs the other cases.
So the question is: What could be causing cuda to report a launch failure in only the mode that profiles its execution?