missing cuda drivers?

I setup my new tx2 according to:
https://developer.nvidia.com/embedded/jetpack

downloaded:
https://developer.nvidia.com/embedded/dlc/jetpack-l4t-2_3_1

doing the full install option from my host pc. I installed python tensorflow from https://devtalk.nvidia.com/default/topic/1000717/tensorflow-on-jetson-tx2/

When running tensorflow in python I get:
tensorflow/stream_executor/cuda/cuda_driver.cc:509] failed call to cuInit: CUDA_ERROR_NO_DEVICE

I can see cuda libs in /usr/local/cuda. But strace is showing lots of suspicious missing file access:


newfstatat(AT_FDCWD, “/usr/bin/nvidia-modprobe”, 0x7fca00f860, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, “/proc/driver/nvidia/params”, O_RDONLY) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, “/dev/nvidiactl”, 0x7fca00f700, 0) = -1 ENOENT (No such file or directory)
mknodat(AT_FDCWD, “/dev/nvidiactl”, S_IFCHR|0666, makedev(195, 255)) = -1 EACCES (Permission denied)
newfstatat(AT_FDCWD, “/usr/bin/nvidia-modprobe”, 0x7fca00f800, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, “/dev/nvidiactl”, O_RDWR) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, “/sys/module/tegra_fuse/parameters/tegra_chip_id”, O_RDONLY) = 3
read(3, “24\n”, 256) = 3
close(3) = 0
openat(AT_FDCWD, “/sys/module/tegra_fuse/parameters/tegra_chip_rev”, O_RDONLY) = 3
read(3, “4\n”, 256) = 2
close(3) = 0
openat(AT_FDCWD, “/sys/module/tegra_fuse/parameters/tegra_platform”, O_RDONLY) = 3
read(3, “silicon\n”, 256) = 8
close(3) = 0
faccessat(AT_FDCWD, “/dev/nvhost-gpu”, R_OK|W_OK) = -1 EACCES (Permission denied)
openat(AT_FDCWD, “/dev/nvgpu-pci”, O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, “/dev/dri/renderD128”, 0x7fca00f8e0, 0) = -1 ENOENT (No such file or directory)
write(2, “E tensorflow/stream_executor/cud”…, 98E tensorflow/stream_executor/cuda/cuda_driver.cc:509] failed call to cuInit: CUDA_ERROR_NO_DEVICE
) = 98
faccessat(AT_FDCWD, “/proc/driver/nvidia/version”, F_OK) = -1 ENOENT (No such file or directory)
uname({sysname=“Linux”, nodename=“jet”, …}) = 0
write(2, “I tensorflow/stream_executor/cud”…, 166I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:145] kernel driver does not appear to be running on this host (jet): /proc/driver/nvidia/version does not exist
) = 166

Is there a separate install for cuda drivers? I’m surprised that wasn’t included in the os flash and subsequent cuda install.

Hi mossmann77,

https://developer.nvidia.com/embedded/dlc/jetpack-l4t-2_3_1 doesn’t support tx2.
You should use https://developer.nvidia.com/embedded/dlc/jetpack-l4t-3_0 .

Hi Vickyy,

Sorry, I pasted the wrong link. I was working w jetpack 3.0. I figured out a solution. First steps to cause the problem:

1.) run jetpack 3.0 installer
2.) choose a network option to connect w target that will fail
3.) watch the os flash work over USB, then fail to connect to target ip
4.) abort installer by closing terminal window
5.) attempt to install again, choosing to skip os flash step

From here things never installed correctly. I tried 5+ times. Finally, to fix the problem

1.) clean out install directory on host where you run the jetpack install from. if you like, you can retain the downloads dir to save time.
2.) do a complete install on the target, specifying a network connection that will not fail.

More details on cause:
I was running Ubuntu 16.04 dual boot on an iMac circa 2013. The wifi works but the ethernet does not.

Hope that helps someone. Perhaps the installer can detect the failed case in the future.

I now realize that the problem was user permissions. I was creating a new user and having errors only with the new user. When I finally tried the default nvidia user things started working. And if I run python with sudo then I can access cuda.

I’ve tried adding my new user to all the groups that nvidia belongs to. Still not working without sudo. How can I grant permissions for a new user to access cuda device?

If you look at “/etc/group” you’ll find users ubuntu and nvidia are members of group “video”. This is the group to add your user to if you want GPU access, e.g., CUDA. Other groups are probably not a great idea.

Hi mossmann77, have you solved this issue? I run into the same situation…

gpu can work if I log as root but cannot work if I log as non-root user. So I think it is due to permission use or environment issue. cannot figure out the solution.

Adding my user to the video group worked for me.

https://askubuntu.com/questions/79565/how-to-add-existing-user-to-an-existing-group

after execute:
sudo usermod -G admin,video admin

nvidia-smi still complain ‘couldn’t communicate with driver’…

If I use SU under amdin, nvidia-smi can work. as SU will not change environment variables/path,
I believe this is really a permission issue. However I have already changed chgmod 777 /usr/local/cuda-8.0 and all libcuda* files under /usr/lib and /usr/lib64.

It is really frustrating…

Make sure to reboot or logout/in after changing group membership.

I think I looked at all the groups for nvidia user and added my user to all of them. Not sure if it was just the video group as others say.

after reboot, still not work. Yesterday I restore my os after os corrupted. I think it is due to permission. After searching, somebody said should change permission of libcuda.so…However I have chmod 777 for all these files. Still not work.

You might install “strace” and run your application with this and check the end of the log for permission information. strace has a LOT of output, but the issue would be near the end of the output, and you can log. An example:
strace -oMyLog.txt ls /root

less -i ./MyLog.txt
/permission
# "shift-g" button to end of less pager...
# "shift-n" to see the last occurrence of case-insensitive "permission"...
# More "shift-n" to search back further.

Note in this example this tells which permission was denied (it’s a kernel system call so this is authoritative):

openat(AT_FDCWD, "/root", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = -1 EACCES (Permission denied)

I could see a possibility that if an application were to try to access some restricted file in “/sys” there might be additional permission issues in addition to being in group “video”. If strace does not find the exact location of permission denied, then perhaps “ltrace” will. strace traces application system calls, ltrace does the same from libraries used by the program (think of strace as directly following system calls, ltrace indirectly follows system calls linked libraries call).