kernel doesn't run after dll unload

I have implemented an algorithm with CUDA. Because we have a plugin-like system for changing the algorithm at runtime, it is wrapped in a MFC Extension dll.
Everything works fine unless I unload and reload the dll.
After the reload, the kernel seems not to start anymore. All runtime functions return with cudaSuccess indicating that everything is set-up correctly. But starting the kernel returns immediately without giving a result. After unloading and reloading the dll for a second time, everything is ok again.
Another algorithm using CUFFT crashes after the unload/reload.

Does anybody have an idea what is going wrong?

Calling cudaDeviceReset() before unloading solves the problem, but I want to avoid it because there may be other CUDA kernels running in the process that can’t be interrupted.
We also use OpenGL in our application, but on a dedicated display GPU. Can there be a conflict with OpenGL?

Debugging with Parallel Nsight proofed that the kernel doesn’t start. A breakpoint in device code is not reached after the unload/reload.