I’m writing an app which consists of multiple kernels held in several DLL files. The kernels are all run in a single stream. When I run the app in debug mode, it runs successfully. However, if I run it in release mode, one of the kernels does not launch, even if I comment out all of the code inside the kernel, just leaving an empty kernel. If I change the project settings on the DLL containing this kernel to debug, leaving the rest of the solution as release, the kernel now runs.
I have tried running cuda-memcheck against the application, but it reports nothing amiss. I have also tried recreating the project from scratch, but still have the same results.
Can anybody give me a pointer as to how I can solve this problem?