Thanks, AastaLLL,
Such image size is nice, and the way it works by bind mouting a bunch of stuff to save space is kinda neat, but it isn’t consisent with x86, and there are other problems I see with this approach. I apologize in advance for the wall of text.
Problem 1 - host integrity
One of the limitations of the beta is that we are mounting the cuda directory from the host. This was done with size in mind as a development CUDA container weighs 3GB, on Nano it’s not always possible to afford such a huge cost. We are currently working towards creating smaller CUDA containers.
And it’s not just /usr/local/cuda. A whole bunch of things need to be mounted inside for it to work, and doing so is risky to the host if you, say, forget to append :ro
to the docker run -v ...
. That’s real easy to do. For example, the documentation says /usr/local/cuda is mounted read only, but the run examples actually bypass that by omitting “:ro”.
Example:
[user@hostname] -- [/usr/local/cuda]
$ sudo docker run -it -v /usr/local/cuda:/usr/local/cuda --rm nvcr.io/nvidia/deepstream-l4t:4.0.2-19.12-samples
root@5d7d31bebf12:~# cd /usr/local/cuda
root@5d7d31bebf12:/usr/local/cuda# ls
LICENSE NsightCompute-1.0 README bin doc extras include lib64 nvml nvvm samples share targets tools version.txt
root@5d7d31bebf12:/usr/local/cuda# touch test
root@5d7d31bebf12:/usr/local/cuda# exit
[user@hostname] -- [/usr/local/cuda]
$ ls
bin doc extras include lib64 LICENSE NsightCompute-1.0 nvml nvvm README samples share targets test tools version.txt
Yes root inside a container should be treated as root outside and “containers do not contain”, but one can imagine situations where it’s easy to accidentally break the system this way, either during image build or at runtime, if some process, like maybe apt
, or somebody’s script, tries to write to some path like /usr/local/cuda
.
This is read only:
$ sudo docker run -it -v /usr/local/cuda:/usr/local/cuda:ro --rm nvcr.io/nvidia/deepstream-l4t:4.0.2-19.12-samples
root@9742f01e5be7:~# cd /usr/local/cuda
root@9742f01e5be7:/usr/local/cuda# touch test
touch: cannot touch 'test': Read-only file system
But that relies on :ro being appended, and nobody forgetting that, which is really easy to do. If there is a base image with those files, on the other hand, overlayfs ensures that you can modify /usr/local/cuda to your heart’s content and the original layers will still be intact.
I recognize the need to conserve space on Tegra which is why I suggested baking the base layer pre-installed image (eg. docker save, docker load) and not installing cuda on the host itself – rather just the drivers – as happens on x86. It’ll require a separate l4t version just for running containers, but those already exist, for Tegra too.
Problem 2 - consistency
Because of the way all this works, more has to be in sync between host and image than with x86 image where the driver just needs to be a minimum version to run a particular image (and cuda does not need to be installed on the host). On top of this, the same Dockerfile used for x86 has to be rewritten for Tegra (at a minium, the FROM line).
Ideally, I would like to be able to:
FROM nvcr.io/nvidia/deepstream:latest
...
… and build that on any NVIDIA platform. If there are unavoidable differences between architectures, I handle that with my build system the same way I do outside a container. That’s harder if things aren’t in consistent locations. For example on Tegra, the headers for deepstream are installed here:
$ sudo docker run -it -v /usr/local/cuda:/usr/local/cuda:ro --rm nvcr.io/nvidia/deepstream-l4t:4.0.2-19.12-samples
[sudo] password for username:
root@a373058f081b:~# cd /root/
root@a373058f081b:~# ls
deepstream_sdk_v4.0.2_jetson
on x86:
... docker run --rm -it nvcr.io/nvidia/deepstream:4.0.2-19.12-devel
...
root@9808ccaa6dcd:/# cd /root/
root@9808ccaa6dcd:~# ls
deepstream_sdk_v4.0.2_x86_64
And when you install the debian package (at least on Tegra), the headers end up in
/opt/nvidia/deepstream/deepstream-4.0/sources/includes/
(instead of /usr/local/include
or whatever). So there are at least three different locations for the headers depending on how you use deepstream.
If these image were built from common parents with common instructions, the headers and samples wouldn’t be in two different locaitions and the image tags would be the same.
I realize this would might require changes to how your repositories/registries work, both apt and Docker, but Canonical manages to get it to work. I can apt-get install “foo” and be assured that “foo” will be the same version on all architectures.
Anyway, you can take or leave this critique. Please don’t take it like I don’t like y’all’s work in general. I like Nvidia products and will continue to develop for Nvidia platforms, but I’ve been avoiding Docker on Tegra for these reasons, and I imagine I might not be the only one. Image size is a plus to this approach, yes, but that’s about it.