nvidia-smi succeeded under SU but failed under normal user

liushiqiang · June 22, 2017, 6:42am

Hi, I restored my os(os: redhat 6.5;cuda:8.0,gpu:K40C) yesterday. If I run nvidia-smi or tensorflow training under my user id, it failed. However if I run SU and execute nvidia-smi, everything is OK. Seems it is due to permission issue? The system which I restored can work well when I created a save point weeks ago.

Below is the nvidia-smi error:
NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure…

After google, I found someone solved similar issue by changing the permission of libcuda.so., however it doesn’t work for me although I chmod 777 for all libcuda.so. in /usr/lib, /usr/lib64 and /usr/local/cuda-8.0

One more strange thing is that when I successfully use SU to execute nvidia-smi, I can execut it even if I didn’t use SU (for example, I can nvidia-smi in a new terminal). However when I restart my server, nvidia-smi seems deactivated and I have to use SU to execute.

could anybody give some advice?

liushiqiang · June 22, 2017, 10:46am

issue was solved after I add s authority to /usr/bin/nvidia-modprobe.