CUDA very slow without root permissions

Hello everyone,

We run a small gpu cluster (Tesla M2070, host running Ubuntu) and we noticed a problem recently.

CUDA codes run fine under root, but when run by a regular user, things would be extremely slow (2-3 minutes for a code that takes 3-4 seconds to run as root). I noticed that most of the time was spent because the code was simply waiting a rpc to complete: The executable would show up as D in ps-STAT, rpc_wait_bit_killable in WCHAN and there would be a huge real, but smaller sys and user times.

I was wondering if you had any ideas why this might be happening.

We are using CUDA 5.0 and Nvidia 310.32 drivers.

I’ve also noticed that if you run the same executable twice, the execution time drops by a factor of 10, but still slower compared to root user.

This sounds like the delay associated with loading the driver after the GPU is idle. Try running this:

nvidia-smi -pm 1

to set all the devices to “persistence mode”, which will prevent the driver from unloading.

Thanks for the suggestion. I set the device to persist mode, but it did not fix the problem.

Did it improve the problem at all?

Either way, I’m out of ideas. I would file a bug report, which requires a “Registered Developer” account.

I don’t know if there is another direct support route for Tesla owners through NVIDIA (but there should be).

Not as far as I can see. It’s hard to tell, because the problem is very random. Sometimes, the slowdown is 10x, sometimes far worse.

It might also be a permissions issue, but I can’t think of anything that might cause this.