Jetson model training on WSL2 Docker container - issues and approach

stantonius · May 13, 2021, 2:19pm

Hello

A Jetson newb here trying to develop & train models on a WSL2-facilitated docker container to leverage the larger GPU on my Windows 10 desktop.

Before I get into the technical issue, my first question is more practical/strategic:

Question 1
Is this the right approach to develop/train on the exact same environment as the Jetson nano by using an aarch64 docker image on my desktop? Or is this overkill - and I should use a bog-standard container (normal x86 18.04 distro or even my Win10 desktop?) to train the model and then send over to the nano?

I understand the above is not strictly a Jetson nano question per se, but the answer will help me decide how much time to spend on the technical issue below…

Question 2
I am able to get several aarch56 containers (see below) up and running with Pytorch prepackaged in the container or installed afterwards, and can even import cv2 seamlessly. However I cannot import torch without the following error:

OSError: libcurand.so.10: cannot open shared object file: No such file or directory

With my limited knowledge, it looks like there is no jetpack installed in this image? Or maybe its a disconnect between Cuda versions? I’m at a loss how to troublshoot further

Other posts discussing this issue suggest adding /usr/local/cuda/lib64 to path, however any docker image I create only has 3 files in this directory, none of which are the one in question.

I have been following this post that discusses setting up Jetson containers using docker and WSL2, and have leveraged Ian Davis’ repo and guides and some other tutorials - all lead me to this same issue.

My setup: Windows 10 Cuda-11.3, Build 21370
Main images tried: nvcr.io/nvidia/l4t-ml:r32.5.0-py3, nvcr.io/nvidia/l4t-pytorch:r32.5.0-pth1.7-py3, nvidia/l4t-base:r32.2.1
WSL2: nvcc --version returns Cuda compilation tools, release 9.1, V9.1.85

Many thanks for any guidance you can provide

dusty_nv · May 13, 2021, 4:12pm

Typically on x86/PC I use a slightly different container which contains similar packages. For example, on PC I frequently use the PyTorch container from NGC. You can train the model there, export it to ONNX, and then copy the ONNX model to your Jetson.

Note that I don’t have experience with WSL, so YMMV. I believe PyTorch also has a build for Windows now. But I would suggest dual-booting your PC, and installing Ubuntu + Docker in native Linux.

If you are installing the PyTorch wheel on your Jetson, make sure that the wheel you downloaded/installed is compatible with the version of JetPack you have (these are listed on the PyTorch thread). You would commonly get this error if you installed a PyTorch wheel that was built against a different version of JetPack than you are running.

If you are getting this error running a container on your Jetson, like l4t-pytorch, make sure that you ran the container with --runtime nvidia. CUDA/cuDNN/TensorRT are mounted into the L4T containers at runtime, so you need --runtime nvidia to get that. The L4T containers shouldn’t be run on x86.

stantonius · May 14, 2021, 1:36pm

Thank you so much for your speedy reply.

For your first suggestion, I will take a look at training and exporting as ONNX. However I did have an eventual objective of fine-tuning the model via Pytorch on the Nano itself (ie. edge training instead of sending data back to main server for training) so I’m not sure if ONNX will allow me to do that.

I was actually trying to run the l4t container on my x86 machine as per this Stereo labs tutorial - for faster initial training (I am combining a couple of pretrained models). Maybe given what you have said this won’t work, but I will take your suggestion and dual boot my PC with Ubuntu and try to follow the above tutorial once more to run the l4t container locally first.

I will update this post with how I get on. Thanks again

dusty_nv · May 14, 2021, 1:47pm

It looks like that tutorial is using QEMU emulator for build aarch64 containers on x86 - that may work for building aarch64 container on x86, and you may be able to run CPU code in the container through QEMU (albeit slowly), but it is unlikely to work with with GPU acceleration on x86 because it is inside QEMU emulator. So I recommend using a native x86 container on x86, then it can use GPU.

If you want to customize the containers, generally you can have two Dockerfiles, one for Jetson and one for x86. They start from different base containers and then you can installed the needed packages into each to make them pretty much the same from a package perspective.

OK sure - in that case, you can just copy your PyTorch checkpoint to your Jetson, and run the same PyTorch code on your Jetson to load it there. Just for example, I run this pytorch-ssd code both on x86 and Jetson. You can use the l4t-pytorch container as a base on Jetson and the NGC pytorch container as a base on x86. It is helpful if you select versions of the containers that have similar PyTorch versions (although typically not absolutely necessary to have matching PyTorch version, depending on the features your PyTorch code uses)