Hi, I restored my os(os: redhat 6.5;cuda:8.0,gpu:K40C) yesterday. If I run nvidia-smi or tensorflow training under my user id, it failed. However if I run SU and execute nvidia-smi, everything is OK. Seems it is due to permission issue? The system which I restored can work well when I created a save point weeks ago.
Below is the nvidia-smi error:
NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure…
After google, I found someone solved similar issue by changing the permission of libcuda.so., however it doesn’t work for me although I chmod 777 for all libcuda.so. in /usr/lib, /usr/lib64 and /usr/local/cuda-8.0
One more strange thing is that when I successfully use SU to execute nvidia-smi, I can execut it even if I didn’t use SU (for example, I can nvidia-smi in a new terminal). However when I restart my server, nvidia-smi seems deactivated and I have to use SU to execute.
could anybody give some advice?