Error message when running Parallelized Face Detection on Jetson

Dear All,

I am facing an error message that is very similar to what was previously raised in an older post: https://devtalk.nvidia.com/default/topic/745791/could-not-insert-nvidia_340-function-not-implemented/
The error message goes something like:

ubuntu@tegra-ubuntu:~$ nvidia-modprobe
modprobe: ERROR: ../libkmod/libkmod-module.c:809 kmod_module_insert_module() could not find module by name='nvidia_340'
modprobe: ERROR: could not insert 'nvidia_340': Function not implemented
ubuntu@tegra-ubuntu:~$

However, in that post, the author reported that he had that error only when he was using a CUDA 6.5 version driver. In my case, my driver is still at the 6.0 version, and I have not yet updated the OS for my Jetson so it is still at that older version which is meant to not work well with the CUDA 6.5.

ubuntu@tegra-ubuntu:~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2013 NVIDIA Corporation
Built on Fri_Apr_18_02:34:39_PDT_2014
Cuda compilation tools, release 6.0, V6.0.1

When I run a shell script that executes a few binaries compiled from CUDA as well as pure C++ source code, a few terminal windows will open corresponding to multiple processes running at the same time. They are all different processes meant to be working together to constitute a robotic system. Although I get the error in just one of the terminal windows:

Client    : 127.0.0.1:10000 
Server    : 127.0.0.1:10000 
Connecting server.....    [OK]
Send init command....    [OK]
modprobe: ERROR: ../libkmod/libkmod-module.c:809 kmod_module_insert_module() could not find module by name='nvidia_340'
modprobe: ERROR: could not insert 'nvidia_340': Function not implemented

I cannot observe any noticeable issues or bugs with my system. My robot uses a webcam and I believe that the above error might concern accessing the webcam, but I’m not sure and I hope that some of you here who are familar with the issue can help. Do any of you know what is the cause of this error, and whether it is something I need to be concerned with?

Thank you very much.

A couple of questions come to mind. First, were any modules built or added to this install? Also, you said you hadn’t updated the install, so is it correct that this is CUDA 6.0 on R19.2 (Jetson ships with R19.2, CUDA 6.0 is correct for all R19.x)? What is “uname -r” and what do you see from “ls /lib/modules”?

Hi linuxdev, I get the following when running those commands:

ubuntu@tegra-ubuntu:/$ uname -r
3.10.24-g6a2d13a
ubuntu@tegra-ubuntu:/$ ls /lib/modules
3.10.24-g6a2d13a  3.13.0-37-generic

I am not exactly sure how to check if this is R19.2, but as you can see, the driver version is 6.0.

Also, right now I am trying to run my code on some video files as inputs, and I get stuck at the same error

ubuntu@tegra-ubuntu:~/gpufacedetection$ ./facedetectGpu 
modprobe: ERROR: ../libkmod/libkmod-module.c:809 kmod_module_insert_module() could not find module by name='nvidia_340'
modprobe: ERROR: could not insert 'nvidia_340': Function not implemented
numFaces: 1 at centreX = 528.50, centreY = 228.50
Total Time: 1075.33 ms
Average FPS: 1
ubuntu@tegra-ubuntu:~/gpufacedetection$

After reading up on what kmod is about, I realized it could be driver issues? So I tried checking with the following:

ubuntu@tegra-ubuntu:~/gpufacedetection$ nvidia-smi
modprobe: ERROR: ../libkmod/libkmod-module.c:809 kmod_module_insert_module() could not find module by name='nvidia_340'
modprobe: ERROR: could not insert 'nvidia_340': Function not implemented
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

ubuntu@tegra-ubuntu:~/gpufacedetection$

but get that error, however, Nvidia drivers are installed as you can see:

ubuntu@tegra-ubuntu:~/gpufacedetection$ dpkg -l | grep nvidia
ii  nvidia-340                                            340.29-0ubuntu1                                     armhf        NVIDIA binary driver - version 340.29
ii  nvidia-340-dev                                        340.29-0ubuntu1                                     armhf        NVIDIA binary Xorg driver development files
ii  nvidia-340-uvm                                        340.29-0ubuntu1                                     armhf        NVIDIA Unified Memory kernel module
ii  nvidia-modprobe                                       340.29-0ubuntu1                                     armhf        Load the NVIDIA kernel driver and create device files
ii  nvidia-settings                                       340.29-0ubuntu1                                     armhf        Tool for configuring the NVIDIA graphics driver

This is really driving me nuts. So in our lab we have another Jetson TK1 that after checking, I’ve found it has no such drivers at all installed (when I do dpkg -l | grep nvidia, I get nothing) and when it runs the code, it never throws any such errors related to the modprobe.

Thanks @linuxdev for your first help, and I hope you know what’s going on here!

Here’s something that might be related. Your module directories show that you’ve also had a kernel or its modules installed for 3.13.0-37-generic. This is a subdirectory of /lib/modules. Under uname -r of 3.10.24-g6a2d13a, those modules will never be found. What modules are in:

/lib/modules/3.13.0-37-generic

?

Do any of those modules involve the code that is failing?

When I do a listing of that directory, I get

ubuntu@tegra-ubuntu:~/gpufacedetection$ ls /lib/modules/3.13.0-37-generic
build
ubuntu@tegra-ubuntu:~/gpufacedetection$ ls /lib/modules/3.13.0-37-generic/build
arch  block  crypto  Documentation  drivers  firmware  fs  include  init  ipc  Kbuild  Kconfig  kernel  lib  Makefile  mm  Module.symvers  net  samples  scripts  security  sound  tools  ubuntu  usr  virt
ubuntu@tegra-ubuntu:~/gpufacedetection$

This doesn’t look good… It seems like the modules were never compiled!!!

From what I know about the history of this particular Jetson, what happened was there there was one point where someone updated the CUDA drivers to 6.5 even before the OS release to support it was out, and the OS GUI just disappeared. Somehow an engineer managed to restore this Jetson, and I believe he might have reformatted the OS or something (I don’t really know what he did…) and reverted the CUDA back to 6.0 . Could it be that he did not do a clean job in the restoration?

And is the Jetson itself supposed to work with Nvidia drivers to begin with?
What’s the difference between an Nvidia driver and a CUDA driver??
Because the ‘nvidia_340’ error that keeps popping up, I think has to do with the driver versions (http://www.nvidia.com/object/IO_32667.html) . This jetson that we are talking about here, it seems to have those drivers, whereas another Jetson that we have, and that did not go through the same troubled history as this jetson, it does not have any of these drivers at all, and it seems to work ok.

All the nvidia_340 stuff relates to the desktop PC drivers and won’t work on Tegra.

In the original post it was said that despite the kmod error, everything seems to work fine. I’m guess what happens is, that it tried to load the desktop driver which will of course fail. But since all the Tegra drivers are already loaded, everything works fine.

In addition to what kulve is saying, the cause of having the incorrect steps appears to be influenced by trying to update to a more upstream kernel (the 3.13.0-37-generic kernel). The ls listing of the 3.13.0-37-generic directory is incomplete, you’d have to do something recursive like this to see:

find /lib/modules/3.13.0-37-generic -type f

To see L4T release:

head -n 1 /etc/nv_tegra_release

To validate nVidia binaries:

sha1sum -c /etc/nv_tegra_release

Based on what you’ve said, my guess is the story went something like this:
Installed CUDA 6.5 which had issues, so attempts were made to add or alter the system to work with 6.5 rather than reverting to 6.0. Eventually an attempt to add or update to a kernel believed to “fix” driver issues. Perhaps this even occurred prior to the R21.x release series which allowed CUDA 6.5, or perhaps there is some other driver that caused the need to try 3.13.0-37-generic. Some of the binary nVidia files may have been overwritten or mixed incorrectly while doing this.

I believe the answer of how to proceed depends upon what L4T release is installed (see release test step above), whether the binary nVidia files are correct for that release (also a step shown above), what kind of work is on the Jetson right now that must be saved, and whether you can afford to just flash a new system. Add to this what job requirements there might be on the Jetson, e.g., does it require CUDA 6.5 or some special hardware driver?

@kulve Thank you! That makes sense then! So in a way, the error messages from the mod_probe is harmless then.

@linuxdev Sounds really plausible! So you think that the attempt to try the newer Kernel version (3.13.0-37-generic) may have affected the Nvidia binaries? Well for the purposes of our project, we are actually wrapping up, and that our codes work properly despite the mod_probe error because the mod_probe error is actually harmless in this case as @kulve explained, we may not want to make any attempt to change anything. But its a great lesson that we have learnt here.

I believe that we would choose the option of just flashing the system entirely if we really had to continue our project, since we have all our codebase in a remote repository that we can just clone right from again after reformatting the Jetson. Thank you for your help!

Currently, the Jetson is not with me as I’m out of my lab for the weekend, but I will attempt to do the recursive listing once I get back to work on monday, and post my the output here.

I have one perhaps more question for this thread - What is the difference between a CUDA Driver and an Nvidia Graphics Card driver?

I guess you could say CUDA exposes the GPU to an external API used for general computing. A regular driver only exposes it to graphics languages like OpenGL (and perhaps things related to the graphics language). An OpenGL driver would expect a monitor, CUDA does not. GPUs with no external monitor connection would become useful as CUDA compute grids.

@linuxdev as a follow up for the suggestions you made previously:

ubuntu@tegra-ubuntu:~$ find /lib/modules/3.13.0-37-generic -type f
ubuntu@tegra-ubuntu:~$ head -n 1 /etc/nv_tegra_release
# R19 (release), REVISION: 2.0, GCID: 3896695, BOARD: ardbeg, EABI: hard, DATE: Fri Apr 18 23:10:46 UTC 2014
ubuntu@tegra-ubuntu:~$ sha1sum -c /etc/nv_tegra_release
/usr/lib/arm-linux-gnueabihf/tegra/libnvdc.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvrm_graphics.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvmm_contentpipe.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvwinsys.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvapputil.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvmm.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvodm_imager.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvomxilclient.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvmmlite_utils.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvparser.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvmm_parser.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvrm.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvtvmr.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvmmlite_video.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvtestresults.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvmmlite_audio.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvodm_query.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libtegrav4l2.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvmmlite.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvavp.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvmm_utils.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvmm_camera.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvos.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libjpeg.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvmm_writer.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvddk_2d_v2.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvddk_vic.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvtestio.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvsm.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvtnr.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvmmlite_image.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvomx.so: OK
/usr/lib/arm-linux-gnueabihf/tegra/libnvfusebypass.so: OK
/usr/lib/xorg/modules/extensions/libglx.so: OK
/usr/lib/xorg/modules/drivers/nvidia_drv.so: OK
ubuntu@tegra-ubuntu:~$

It seems like doing the directory listing still shows nothing, but we can see that the release is the R19 and that all the binaries are ok and validated. But its not clear to me, how does the sha1sum command validate the binaries? I looked at the /etc/nv_tegra_release file and it seems like it contained a bunch of checksums associated with each binary, but that it had the ‘*’ symbol prepended in front of each binary file path. Does the validation of the binaries have anything to do with these checksums?

From what I see the actual “apply_binaries” is correct, and no package update has harmed any of the Jetson-specific binaries. However, it looks like the “340” is for a release which does not match the rest of the L4T (I can’t guarantee it, I’d have to poke around with the actual Jetson)…my guess is the 340 and the files listed in “dpkg -l | grep nvidia” are in fact not intended for use in the Jetson.

It really appears someone wanted to make the CUDA 6.5 work without going to the R21.x L4T release…which of course was not available for a long time and probably inspired the attempt at newer kernels and supporting files. Trying to back out invalid packages and see what went wrong just won’t be worth your time since you have the ability to flash without losing your work. I’d suggest flashing to R21.3 which should directly work with any CUDA 6.5 without having to mess with it.

@linuxdev thank you once again for your previous post, and sorry for getting back to you on this so late. The thread here has answered all the doubts I have about my Jetson board for now. We do not have any project coming up to use our board, and we have also successfully wrapped up our project after proceeding knowing that the error was harmless (so we were busy the past few weeks doing that). True that. In any future project, we would update the OS and take in all the latest upgrades before we do anything with our Jetson. Thank you!