nvenc firmware load failure

Hi everybody, I’m experiencing a quite strange issue with gstreamer-1.0 and TX2. In the project I’m currently working on, I’ve got a bunch of jetsons employed in a distributed vision system. These boards have the official nvidia distro (L4T 28.2.1, ubuntu 16.04 with kernel 4.4.38). They have the tegra-customized (by nvidia) gstreamer plugins in /usr/lib/aarch64-linux-gnu/gstreamer-1.0 folder and I’ve installed all the other gstreamer packages (tools, libs and base-good-bad plugins) through apt-get. That is basically what nvidia-customized gst-install does, with the only difference that this script compiles those tegra-customized plugins from sources (there are several ifdefs USE_OMX_TARGET_TEGRA in their code) instead of decompressing them out from nvgstapps.tbz2.

My gstreamer pipeline is the following:

appsrc name=videosource is-live=true stream-type=GST_APP_STREAM_TYPE_STREAM format=GST_FORMAT_TIME ! video/x-raw, format=BGR, width=1920, height=1080, framerate=30/1 ! videoconvert ! videoscale ! video/x-raw, width=1280, height=720, framerate=15/1 ! omxh265enc ! queue max-size-buffers=0 max-size-time=0 flush-on-eos=true ! mux. alsasrc name=audiosource device=hw:2,0 ! audioconvert ! vorbisenc ! queue max-size-buffers=0 max-size-time=0 flush-on-eos=true ! mux. matroskamux name=mux ! filesink location=<file location> name=filesink

As you can see, the goal here is to collect frames from the camera (I’m using OpenCV for that), feed them to the pipeline through appsrc, convert and rescale them, encode them with h265 (using tegra acceleration), capture audio with camera microphone and finally mux everything into a .mkv file.

In my dev board everything is working fine. All gstreamer plugins are correctly shown by gst-inspect and there are no blacklisted ones. The application starts collecting frames and feeding them to the pipeline. I constantly log out all internal gstreamer messages, and they are all visualized correctly. The video is marked as finished by the application, gst_event_new_eos is sent and asyncronously received. Finally the video is correctly closed.

For deployment purposes, we created a custom distro with debootstrap. Then, we use apply-binaries to provide the iso with nvidia binaries. Also, we use the exact same kernel of L4T 28.2.1. Our default user is root. I’ve installed gstreamer through apt-get, recreating the same context as before. All gstreamer plugins are correctly shown by gst-inspect and there are no blacklisted ones. The weird behavior that happens here is that when I run the application on the other boards, a huge memory consumption is reported.
By looking at the kernel messages, it turns out that it fails to load the falcon nvenc firmware. The reported error is the following:

[  277.499132] xhci-tegra 3530000.xhci: tegra_xhci_mbox_work mailbox command 6
[  277.601864] falcon 154c0000.nvenc: Direct firmware load for tegra18x/nvhost_nvenc061.fw failed with error -2
[  277.611905] falcon 154c0000.nvenc: Falling back to user helper
[  277.619602] falcon 154c0000.nvenc: failed to get firmware
[  277.625097] falcon 154c0000.nvenc: failed to get firmware
[  277.630554] falcon 154c0000.nvenc: nvhost_flcn_init_sw: failed

As a consequence, it seems that it’s falling back using the same omxh265 encoder but without hw acceleration, which is much more memory consuming (for some reason).
If I run for some seconds the following test gstreamer pipeline:

videotestsrc ! omxh265enc ! queue ! mux. matroskamux name=mux ! fakesink

no error is reported in the kernel and the hw acceleration is used correctly. Now if I run my own application everything starts working correctly. My guess is that there is some firmware loading which needs some time to finish. Can you give me some hints regarding which is and how can I consistently wait for it before starting my pipeline?

Thank you in advance.

Error -2 should be -ENOENT

./include/uapi/asm-generic/errno-base.h:#define ENOENT           2      /* No such file or directory */

Probably the path is changed, or the path is with wrong permission?

Hi domey4x13,

Have you clarified the cause and resolved the problem?
Any result can be shared?

Thanks

Hello,
we are pretty sure to have identified the cause of the issue. Basically, we had another application running inside a docker container that was using the encoder. The driver was not able to load the encoder firmware since the host folder /lib/firmware was not mounted (and, hence visible) inside the container. It is not clear to us why the host folder /lib/firmware must be visible from inside the container in order to successfully load the firmware (it seems to us that the firmware loading should be executed in kernel mode). If the firmware folder is not found the encoder switches to CPU mode and, hence, its memory and CPU consumption are quite high. What is quite curios is that if the firmware loading fails there is no way to make the encoder work in HW mode (even when using it from outside the container, i.e. a native application) unless a host reboot is performed.

Thanks for sharing your findings of running in docker mode.