I am trying to set up accelerated builds using ccache. This is done using the cache mount feature provided by docker buildkit.
However, buildkit does not support the nvidia runtime.
I have found through searching that it is possible to use stub libraries e.g. found here:
/usr/local/cuda/lib64/stubs to allow software to build against cuda and other dependencies in a runc runtime docker build (as buildkit requires this) and then make it work somehow at runtime.
01:04PM - 25 May 21 UTC
11:26AM - 09 Dec 21 UTC
### 1. Issue or feature description
When building a container (docker 19.03) th
… at should use CUDA it should bind the nvidia driver libraries available on the host into the container.
It is working perfectly with legacy build. But with buildkit in some cases it loose the binding.
The file that loose binding is the "libcuda.so" which is the CUDA driver library.
libcuda.so is a simlink to libcuda.so.1 which is a symlink to libcuda.so.version (in my case libcuda.so.455.32.00)
This issue cause the linking to fail with this error:
/usr/lib/x86_64-linux-gnu/libcuda.so: file not recognized: File truncated
collect2: error: ld returned 1 exit status
When prinitng the file size during the build, with the following command:
`RUN ls -l /usr/lib/x86_64-linux-gnu/libcuda.so.455.32.00`
With legacy build we get:
`-rw-r--r-- 1 root root 21074296 Oct 14 2020 /usr/lib/x86_64-linux-gnu/libcuda.so.455.32.00`
but with buildkit we get size of zero:
`-rw-r--r-- 1 root root 0 Mar 1 15:18 /usr/lib/x86_64-linux-gnu/libcuda.so.455.32.00`
### 2. Steps to reproduce the issue
RUN ls -l /usr/lib/x86_64-linux-gnu/libcuda.so.455.32.00
`docker build --progress=plain -t test .`
**Result without buildkit:**
Sending build context to Docker daemon 4.968GB
Step 1/3 : FROM nvidia/cuda:11.1-cudnn8-devel-ubuntu18.04
11.1-cudnn8-devel-ubuntu18.04: Pulling from nvidia/cuda
f22ccc0b8772: Already exists
3cf8fb62ba5f: Already exists
e80c964ece6a: Already exists
8a451ac89a87: Already exists
c563160b1f64: Already exists
596a46902202: Already exists
aa0805983180: Already exists
5718c3da35a0: Already exists
003637b0851a: Already exists
Status: Downloaded newer image for nvidia/cuda:11.1-cudnn8-devel-ubuntu18.04
Step 2/3 : RUN ls -l /usr/lib/x86_64-linux-gnu/libcuda.so.455.32.00
---> Running in fb3bff4aebb4
-rw-r--r-- 1 root root 21074296 Oct 14 2020 /usr/lib/x86_64-linux-gnu/libcuda.so.455.32.00
Removing intermediate container fb3bff4aebb4
Step 3/3 : CMD ["/bin/bash"]
---> Running in 06022742780f
Removing intermediate container 06022742780f
Successfully built 180b8e423aa1
Successfully tagged test:latest
**Result with buildkit:**
#1 [internal] load build definition from Dockerfile.test
#1 transferring dockerfile: 170B done
#1 DONE 0.1s
#2 [internal] load .dockerignore
#2 transferring context: 2B done
#2 DONE 0.1s
#3 [internal] load metadata for docker.io/nvidia/cuda:11.1-cudnn8-devel-ubu...
#3 DONE 0.7s
#4 [1/2] FROM docker.io/nvidia/cuda:11.1-cudnn8-devel-ubuntu18.04@sha256:ea...
#5 [2/2] RUN ls -l /usr/lib/x86_64-linux-gnu/libcuda.so.455.32.00
#5 0.273 ls: cannot access '/usr/lib/x86_64-linux-gnu/libcuda.so.455.32.00': No such file or directory
#5 ERROR: executor failed running [/bin/sh -c ls -l /usr/lib/x86_64-linux-gnu/libcuda.so.455.32.00]: runc did not terminate sucessfully
> [2/2] RUN ls -l /usr/lib/x86_64-linux-gnu/libcuda.so.455.32.00:
failed to solve with frontend dockerfile.v0: failed to build LLB: executor failed running [/bin/sh -c ls -l /usr/lib/x86_64-linux-gnu/libcuda.so.455.32.00]: runc did not terminate sucessfully
Is this possible? Can someone point out for me to how to install the stub libraries and how to approach this?
So far this seems potentially possible and it is tantalizing, but I think that I will need to resort to this solution in the meantime:
Have you tried the container built for JetPack 5?
We have included all the libraries into the container from JetPack.
So you don’t need to mount it from the native.
Thanks, that should prove to be helpful. i think we are not able to use that right now though since we’re stuck on 4.4.1 due to kernel device tree compatibility issues. But I will be able to test it at least to evaluate.
Are you saying that the nvidia docker runtime (since this is
incompatible with docker buildkit) is no longer required to run docker builds against cuda libs with the latest jetpack 5?
For JetPack 5, we directly include the libraries in the containers rather than mounting.
October 26, 2022, 3:22am
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.