To answer your questions in order:
There is no slowdown before entering CUDA debugging. Running in release mode works completely fine.
I start seeing slowdown exactly when the program starts to enter CUDA kernels. Other (CPU) code that runs before the CUDA kernel (initialization, etc) works fine. However, once I hit the CUDA functions, it slams nearly to a halt, especially the particle density computation function I mentioned before. If I wait a long time, it will eventually get passed this function, but then gets stuck at the next function as well.
When I am debugging, I’ve got nsight monitor open, as well as a bunch of basic debugging windows (warp watch, memory watch, etc) open in visual studio.
I tried some of the CUDA samples, but they all seem to work fine. This is very weird, why would my program work for an older version but not the newer, but other programs work fine for both? Could there be a setting in the project files that might be messed up in mine?