I have a mex function which runs in NSight when env var NSIGHT_CUDA_DEBUGGER = 1 and I attach NSight to the mex function or w/o NSight.
But if I have NSIGHT_CUDA_DEBUGGER = 0, and run it, I get cudaErrorLaunchFailure (4) with or w/o VS debugging.
Why would a kernel run correctly with NSIGHT_CUDA_DEBUGGER = 1 and fail to launch with NSIGHT_CUDA_DEBUGGER = 0?