Unable to profile application. "Internal error: CUDA"

I have a small program, which ran fine in my desktop (GeForce 660) and was profiled (using Visual Profiler) perfectly. Then I used K20c to do the exactly the same thing, it ran OK in Nsight, and when I tried to profile it, it gave me an error: Unable to profile application, “Internal error: CUDA”. I am using CUDA 5.5 on both machines. Anyone has any ideas? Any help would be appreciated!

More information: I am using Ubuntu 12.04 on both machine. By the way, I used k20c remotely. I thought it was permission issues, but even I logged in as root, it did not work, either.

OK, I got it solved. The machine with K20c has two K20cs. I think when I profiled it, the two devices confused profiler or something. When I changed the environment variable and only exported one, and it worked. I do want to know what exactly the reason is. I thought if I did not specify which device I would use, the program would automatically pick device 0, and only device 0 would be profiled. But obviously, it does not behave like this. Any help would be appreciated!