We run a small gpu cluster (Tesla M2070, host running Ubuntu) and we noticed a problem recently.
CUDA codes run fine under root, but when run by a regular user, things would be extremely slow (2-3 minutes for a code that takes 3-4 seconds to run as root). I noticed that most of the time was spent because the code was simply waiting a rpc to complete: The executable would show up as D in ps-STAT, rpc_wait_bit_killable in WCHAN and there would be a huge real, but smaller sys and user times.
I was wondering if you had any ideas why this might be happening.
We are using CUDA 5.0 and Nvidia 310.32 drivers.