Hi NVIDIA team,
My company and I are using the Xavier NX module for our robot platform, but we are having issues loading our docker image onto the module in order to run our environment. Much of this is related to the 16GB storage space, but we might be able to overcome this with a bit more understanding of the system. On the FAQ, I saw a link to this: Jetson/L4T/Boot From External Device - eLinux.org, but I am still wondering what is the best way to think about the interaction between the module, CUDA, and docker.
The first attempt was to treat the Xavier NX module (emmc) just like the Developer Kit - install CUDA and all SDK components onto it, then try to load the docker container (which also contained CUDA). After flashing everything onto the emmc module, there were something like 6GB of storage space remaining, and our image was less than 5GB, so I figured this would be suitable. However, it seems that the way docker images are loaded requires something like double the actual space they take up, so this was not even close to able to work.
Because CUDA was already installed within our docker image, my next thought was to not install it (or any other SDK components) on the Xavier NX module when flashing the OS onto it. After a fresh flashing of the OS, loading the docker image eventually worked. However, when I tried to then run our CUDA-dependent code within the docker, an error was thrown that meant, at least to me, that it wouldn’t work unless CUDA is flashed onto the “host” system.
Then, I thought that I would try to save the space within the docker image by removing CUDA from it. However, when I tried to run the commands which I have previously used to uninstall CUDA (sudo apt-get --purge remove [nsight, cublas, cuda, nvidia]), it indicated that most of these were not installed in the first place even though /usr/local/cuda and associated folders still existed. Does this mean that CUDA was actually not installed inside the docker, but was just using the “host” machine’s installation? When I tried this, I was using the Developer Kit, which had CUDA installed in “both” places.
So, with this background, my main questions are: 1) How should we approach running CUDA-dependent code from within a docker image on Jetson Xavier NX? 2) Should CUDA be installed outside the docker container (on the host), inside the container/image, or both? 3) Will future versions of similar Jetson modules have larger storage space, so these issues will be easier to overcome?
Please let me know if you have any insight into these matters.
Best wishes,
Daniel Freer