Docker Support DriveOS 6.0.8.1

Please provide the following info (tick the boxes after creating this topic):
Software Version
DRIVE OS 6.0.8.1
DRIVE OS 6.0.6
DRIVE OS 6.0.5
DRIVE OS 6.0.4 (rev. 1)
DRIVE OS 6.0.4 SDK
other

Target Operating System
Linux
QNX
other

Hardware Platform
DRIVE AGX Orin Developer Kit (940-63710-0010-300)
DRIVE AGX Orin Developer Kit (940-63710-0010-200)
DRIVE AGX Orin Developer Kit (940-63710-0010-100)
DRIVE AGX Orin Developer Kit (940-63710-0010-D00)
DRIVE AGX Orin Developer Kit (940-63710-0010-C00)
DRIVE AGX Orin Developer Kit (not sure its number)
other

SDK Manager Version
1.9.3.10904
other

Host Machine Version
native Ubuntu Linux 20.04 Host installed with SDK Manager
native Ubuntu Linux 20.04 Host installed with DRIVE OS Docker Containers
native Ubuntu Linux 18.04 Host installed with DRIVE OS Docker Containers
other

Hello,

we face the same issues regarding Docker as described in this post:

Unfortunately the topic is closed.

We followed the provided solution there, and both compiling the kernel and flashing the Orin device (via docker on host with flash.py) have been successful.

After restarting and also reinstalling Docker, the error message still persists.

Is there anything obvious we are still missing and/or can you please describe the exact steps required before or after compiling/flashing to enable Docker support again?

Thank you very much for your support.

Dear @sebbe,
That means provided WAR did not help you. Let me check again and get back to you.

1 Like

Dear @sebbe ,
Did you try adding --privileged flag in the docker run command?

Dear @SivaRamaKrishnaNV,

thanks for your reply.

I tried the --privileged flag, but docker is still not able to create an endpoint on the network bridge, stating “failed to add the host (veth…”

Checking with modprobe and modinfo, the module “veth” is still missing on the Orin, which I suppose is the reason for that.

Dear @SivaRamaKrishnaNV,

are there any updates to this?

Dear @SivaRamaKrishnaNV,

are there any updates to this?

Dear @SivaRamaKrishnaNV

I followed both compiling the kernel & flashing the Orin using bootburn
but facing same issue.

are there any updates on this?

Dear @satish.sonwale,
It appears to be a bug. I am checking on this and update you once I have concrete update.

Dear @SivaRamaKrishnaNV,

We faced with the very same issue too, and now are completely blocked.

  • Is there a plan to re-enable running docker containers with the built-in kernel? What was the reason to disable it?
  • Could you provide a workaround that works (a binary kernel would be the best that works out of the box)?

Thank you, kind regards,
Adam

I tried to rebuild the kernel with the required modifications based on Compiling the Kernel (Kernel 5.15) | NVIDIA Docs

I’m at step 14, Flash the target using the SDK Bootburn tool to the board.
This script is quite complex, it would be nice to have an exact guide what to do here. Besides, based on the workaround in [BUG] failed to start docker container in orin target with error: failed to create endpoint on network bridge, operation not supported - #6 by SivaRamaKrishnaNV we are building a non-production kernel which is a show stopper for us.

Dear @AdamBalazsVay,
After reflash, Do you see issue with Keyboard/mouse control and target is accessed via serial console? Please see [BUG] Run Docker in 6.8.0.1 failed or you notice a different issue?

Dear @SivaRamaKrishnaNV,

It is the reflash step I struggle with, so I could not figure out the way how to use the SDK Bootburn tool to flash the device with the custom kernel. What parameters do I need to specify there, how can I specify the new kernel to flash? It would be nice to have a step by step guide what commands I need to execute, like in the other steps of the kernel build guide. Besides, in [BUG] failed to start docker container in orin target with error: failed to create endpoint on network bridge, operation not supported - #6 by SivaRamaKrishnaNV we build a non-production kernel ( defconfig and not tegra_prod_defconfig), so it could not be used in our production environment. Could you please give us an update how nvidia plans to fix this in a proper, production ready way and what the timeline is? We are blocked now by this issue.

Dear @AdamBalazsVay,
May I know what command you have used to flash?
Did you use the below commands to flash the target?

make/bind_partitions -b <board config> linux
#put the board in recovery mode from aurix console
tools/flashtools/bootburn/bootburn.py -b <board config> -B qspi
##put the board in normal mode form aurix console
1 Like

I haven’t used any command yet, I wanted to figure out how to do it first :) I will check the command you have shared. In the case of Orin dev boards the commands look like the following, right?

make/bind_partitions -b p3710-10-a01 linux
#put the board in recovery mode from aurix console
tools/flashtools/bootburn/bootburn.py -b p3710-10-a01 -B qspi
##put the board in normal mode form aurix console

Dear @AdamBalazsVay,
Yes. Please double check the board config name from Autonomous Vehicle Virtual Machine Configuration | NVIDIA Docs for your board. Or you can look at sdkmanager/docker flash logs to confirm the board config as well.

Dear @SivaRamaKrishnaNV,

I tried to open a bug ticket on your support page but I got redirected to this forum again.

While we check if the workaround you proposed (flashing a custom kernel) could unblock us for a short term, we would like to get an overview about the exact way and timeline of the proper fix from NVIDIA (not a workaround).

I would like to get a confirmation, that it is indeed a bug and will be fixed in a future release (this article e.g. explicitly mentions that running docker containers are supported out of the box on NVIDIA DRIVE AGX Orin). A binary patch for the current version would be highly appreciated as well. Currently our ongoing Orin transition is blocked because of this bug, as we deploy our code as a docker image we run on the devices.

Dear @AdamBalazsVay,
Yes. It is a known issue. Please let us know if the WAR helps?

Dear @SivaRamaKrishnaNV,

I can confirm, that with compiling and deploying a custom kernel with general → namespaces support and with making device driver → network device → veth builtin instead of loadable, we can run docker containers again.

Unfortunately it is not a production ready solution, just a workaround, as we are compiling a kernel with non-prod flags. Please let us know about the timeline of a proper, binary fix.

Kind regards,
Adam

1 Like

Dear @SivaRamaKrishnaNV,

As I see DriveOS 6.0.9 got released, but I don’t have access yet to the release notes. Do you know if we can expect the new release to fix this issue?

1 Like

Dear @AdamBalazsVay,
DRIVE OS 6.0.9 is not available for devzone ecosystem customers. FYI, the fix is not part of DRIVE OS 6.0.9.