Docker on top of AGX orin hardware with 6.0.6

0xdeadbeef · May 22, 2023, 3:14pm

Please provide the following info (tick the boxes after creating this topic):
Software Version
DRIVE OS 6.0.6
DRIVE OS 6.0.5
DRIVE OS 6.0.4 (rev. 1)
DRIVE OS 6.0.4 SDK
other

Target Operating System
Linux
QNX
other

Hardware Platform
DRIVE AGX Orin Developer Kit (940-63710-0010-D00)
DRIVE AGX Orin Developer Kit (940-63710-0010-C00)
DRIVE AGX Orin Developer Kit (not sure its number)
other

SDK Manager Version
1.9.2.10884
other

Host Machine Version
native Ubuntu Linux 20.04 Host installed with SDK Manager
native Ubuntu Linux 20.04 Host installed with DRIVE OS Docker Containers
native Ubuntu Linux 18.04 Host installed with DRIVE OS Docker Containers
other

Hi ,

we wanted to use 6.0.6 docker container on top of drive AGX Orin. But we see that the size is 14.7GB . Is there minimal version or it is like we need to install this heavy ?

Regards,
Sistla.

VickNV · May 22, 2023, 3:42pm

Could you share how you got the size? It would be helpful to know which specific Docker container you are referring to and what your specific use cases are. Thanks.

0xdeadbeef · May 22, 2023, 4:01pm

Hey Vick,

This is the size of the drive AGX orin container in the NGC private registry.

in there i see that the size of 6.0.6 is 14.71GB.

we want to make sure that we can create a containerized environment on the drive AGX orin for running our applications.

VickNV · May 22, 2023, 4:57pm

That is for running on host systems. Please refer to the following links for target systems.

0xdeadbeef · May 22, 2023, 6:49pm

Vick,
The first link in your reply says : “you can pull target-side Docker images from NGC or Docker Hub, and run GPU-accelerated containers on the target right”

can you pls point me to the link that points to the reference target side docker image. I am not able to find the target side docker image ?

Regards,
Sistla.

VickNV · May 22, 2023, 7:16pm

Please refer to the command in the blog.

nvidia@tegra-ubuntu:/usr/local/cuda-11.4/samples/0_Simple/matrixMul$ sudo docker run --rm --runtime nvidia --gpus all -v $(pwd):$(pwd) -w $(pwd) ubuntu:20.04 ./matrixMul

0xdeadbeef · May 22, 2023, 9:34pm

Vick,

we have a docker environment that we want to merge with the nvidia docker with cuda and driver support and run that docker on top of orin. can you give any pointers on how this can be accomplished ?

0xdeadbeef · May 23, 2023, 6:11pm

Hi Vick,

Is it not possible to start a docker on a standalone with out invoking the application ?

VickNV · May 23, 2023, 8:09pm

Please refer to Docker image creation.

VickNV · May 23, 2023, 8:10pm

Please refer to [BUG] target-docker-container running driveworks sample_hello_world failed where the developer started from bash.

VickNV · May 24, 2023, 2:29pm

In any case, we do not provide a target-side Docker image, only the Docker runtime with the NVIDIA Container Toolkit (nvidia-docker) stack to facilitate running GPU-accelerated applications with the Tegra.

To create a container on the target the customer only needs to follow standard Docker practices for writing your Dockerfiles and workflows to build ARM64 images (if you are building on x86 host) or a native Docker image (if you are building target-side).

To have NVIDIA Container Toolkit (nvidia-docker) support (to get some CUDA and driver support), you simply need to pass --runtime nvidia --gpus all when executing a docker run command. If you have any dependencies that need to be mounted, then we provide guidance in the blog for how to modify the drivers.csv and device.csv files to specify your dependencies, which will then be handled by the NVIDIA Container Toolkit (nvidia-docker) stack.

Depending on their use-case, I may also recommend mounting the CUDA directory to the container at runtime, using -v /usr/local/cuda-:/usr/local/cuda.

In total, to run your image, after having modified the drivers.csv and the devices.csv files appropriately, you may likely end up with a docker run command that looks similar to the following:

$ sudo docker run --rm --runtime nvidia --gpus all -v /usr/local/cuda-11.4:/usr/local/cuda

0xdeadbeef · May 24, 2023, 3:34pm

thank you vick,

I am trying to run the docker with this command
sudo docker run --rm --network host --runtime=nvidia --gpus all -v /usr/local/cuda-11.4:/usr/local/cuda-11.4/ --name av-stack ed9214aa0b8e
but i still see that the nvidia container toolkit is not available and nvidia-smi returns nothing.

VickNV · May 24, 2023, 3:56pm

nvidia container toolkit is part of DRIVE OS. nvidia-smi isn’t supported on tegra. You just need to follow the blog.

0xdeadbeef · June 6, 2023, 8:12pm

Hi Vick,

when i am trying to run the docker using the above command i am encountering error with respect to the veth network bridge. Any pointers on this ?

sudo docker run --rm --runtime nvidia --gpus all -v $(pwd):$(pwd) -w $(pwd) ubuntu:20.04 ./matrixMul
Unable to find image ‘ubuntu:20.04’ locally
20.04: Pulling from library/ubuntu
8659cf1709ef: Pull complete
Digest: sha256:db8bf6f4fb351aa7a26e27ba2686cf35a6a409f65603e59d4c203e58387dc6b3
Status: Downloaded newer image for ubuntu:20.04
WARNING: IPv4 forwarding is disabled. Networking will not work.
docker: Error response from daemon: failed to create endpoint wonderful_bhaskara on network bridge: failed to add the host (vethdeac9b7) <=> sandbox (veth90c752c) pair interfaces: operation not supported.
ERRO[0002] error waiting for container: context canceled

VickNV · June 6, 2023, 10:45pm

It worked well on my devkit with DRIVE OS 6.0.6. FYI.

$ cd /usr/local/cuda-11.4/samples/0_Simple/matrixMul && sudo make
>>> GCC Version is greater or equal to 4.7.0 <<<
/usr/local/cuda-11.4/bin/nvcc -ccbin g++ -I…/…/common/inc -m64 --threads 0 --std=c++11 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_62,code=sm_62 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_72,code=sm_72 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_87,code=sm_87 -gencode arch=compute_87,code=compute_87 -o matrixMul.o -c matrixMul.cu
/usr/local/cuda-11.4/bin/nvcc -ccbin g++ -m64 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_62,code=sm_62 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_72,code=sm_72 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_87,code=sm_87 -gencode arch=compute_87,code=compute_87 -o matrixMul matrixMul.o
mkdir -p …/…/bin/aarch64/linux/release
cp matrixMul …/…/bin/aarch64/linux/release
nvidia@tegra-ubuntu:/usr/local/cuda-11.4/samples/0_Simple/matrixMul$ sudo docker run --rm --runtime nvidia --gpus all -v $(pwd):$(pwd) -w $(pwd) ubuntu:20.04 ./matrixMul
WARNING: IPv4 forwarding is disabled. Networking will not work.
[Matrix Multiply Using CUDA] - Starting…
GPU Device 0: “Ampere” with compute capability 8.7

MatrixA(320,320), MatrixB(640,320)
Computing result using CUDA Kernel…
done
Performance= 617.89 GFlop/s, Time= 0.212 msec, Size= 131072000 Ops, WorkgroupSize= 1024 threads/block
Checking computed result for correctness: Result = PASS

NOTE: The CUDA Samples are not meant for performancemeasurements. Results may vary when GPU Boost is enabled.

0xdeadbeef · June 7, 2023, 2:13am

Hi Vick,
I am pretty sure it would definitely work in your case, can you tell me what am i missing , how to debug my case ?

VickNV · June 7, 2023, 4:59am

Please try it right after reflashing DRIVE OS 6.0.6. I don’t know if any environmental change on the target caused it.

0xdeadbeef · June 12, 2023, 3:24am

Hi Vick,

I tried to mange and steer a little bit , i see that the veth driver is not loaded by default , so had to manually install it. But then i see that i hit into the cuda error with this

@tegra-ubuntu:/usr/local/cuda-11.4/samples/0_Simple/matrixMul$ sudo docker run --rm --runtime nvidia --gpus all -v $(pwd):$(pwd) -w $(pwd) ubuntu:20.04 ./matrixMul
WARNING: IPv4 forwarding is disabled. Networking will not work.
CUDA error at …/…/common/inc/helper_cuda.h:781 code=801(cudaErrorNotSupported) “cudaGetDeviceCount(&device_count)”
[Matrix Multiply Using CUDA] - Starting…

Any pointers why i see this issue with cuda ? what drivers might be missing in this case ?

VickNV · June 12, 2023, 3:32pm

It appears that you’re encountering an issue with the CUDA library, specifically when calling the cudaGetDeviceCount() function. Before we can provide you with specific guidance, could you please provide some additional information?

Have you reflashed the devkit before this try? Which version of DRIVE OS you are currently using? Furthermore, it would be helpful if you could share the complete output of building the ‘matrixMul’ application.

0xdeadbeef · June 14, 2023, 4:26pm

Thanks Vick,

a clean installation of 6.0.6 on the orin resolved the issue. .
Now am able to get the cuda tested in the docker on orin.
tegra-ubuntu:/usr/local/cuda/samples/0_Simple/matrixMul# ./matrixMul
[Matrix Multiply Using CUDA] - Starting…
GPU Device 0: “Ampere” with compute capability 8.7

MatrixA(320,320), MatrixB(640,320)
Computing result using CUDA Kernel…
done
Performance= 617.98 GFlop/s, Time= 0.212 msec, Size= 131072000 Ops, WorkgroupSize= 1024 threads/block
Checking computed result for correctness: Result = PASS

NOTE: The CUDA Samples are not meant for performancemeasurements. Results may vary when GPU Boost is enabled.

Topic		Replies	Views
Error while running Docker Services example on Drive Orin DRIVE AGX Orin General docker	11	928	October 17, 2023
Running Docker Containers Directly on NVIDIA DRIVE AGX Orin Technical Blog	17	1686	April 25, 2023
[BUG] failed to start docker container in orin target with error: failed to create endpoint on network bridge, operation not supported DRIVE AGX Orin General docker	7	1806	September 5, 2024
Upgrading CUDA for Autoware Compatibility and tensorrt libs not Accessible Inside the l4t-jetpack DRIVE AGX Orin General driveos-cuda	10	791	January 22, 2024
[BUG] target-docker-container running cuda-samples require unintended extra permission DRIVE AGX Orin General docker	12	1486	May 30, 2023
Unable to to install Nvidia Driver on Drive AGX Orin DRIVE AGX Orin General driveos-cuda	11	998	November 29, 2023
CUDA installation failing while downgrading to Drive OS 6.0.5 with SDK Manager DRIVE AGX Orin General drive-platform-setup	18	540	April 18, 2024
When using Docker on 6.0.10, an error occurred：code=801(cudaErrorNotSupported) "cudaGetDeviceCount(&device_count)" DRIVE AGX Orin General driveos	3	23	September 6, 2024
Docker Support DriveOS 6.0.8.1 DRIVE AGX Orin General docker	27	1788	May 3, 2024
Docker image for Jetson AGX Orin with CUDA environment Jetson AGX Orin cuda , docker , containers	5	1767	June 6, 2024

Docker on top of AGX orin hardware with 6.0.6

Related topics