How can I create my custom containers for Jetson Nano

Hi everybody!

I am a newbie about the dockers and containers. I checked a few images for containers in nvidia catalog but the configuration I want is not available.

What I want to have is similar to l4t-ml:r32.5.0-py3 but rather than using Tensorflow 1.15, I want to use Tensorflow 2.3.1 as in the l4t-tensorflow:r32.5.0-tf2.3-py3. I decided to build my own image, however, I got confused while checking it. Is there anybody that can point to a nice and easy to follow tutorial, possibly for jetson nano or explain to me how to do that?

l4t-tensorflow:r32.5.0-tf2.3-py3
TensorFlow 2.3.1

l4t-ml:r32.5.0-py3
    TensorFlow 1.15
    PyTorch v1.7.0
    torchvision v0.8.0
    torchaudio v0.7.0
    onnx 1.8.0
    CuPy 8.0.0
    numpy 1.19.4
    numba 0.52.0
    OpenCV 4.4.1
    pandas 1.1.5
    scipy 1.5.4
    scikit-learn 0.23.2
    JupyterLab 2.2.9

Cheers

Hi @mertnano

There is @dusty_nv’s jetson containers’ repository

In my opinion, you can clone and change the Dockerfile.ml and scripts/docker_build_ml.sh files as you want:

git clone https://github.com/dusty-nv/jetson-containers.git
cd jetson-containers
sed -i "s/32.4.4/32.5.0/g" ./Dockerfile.ml
sed -i "s/TENSORFLOW_IMAGE=l4t-tensorflow:r$L4T_VERSION-tf1.15/TENSORFLOW_IMAGE=l4t-tensorflow:r$L4T_VERSION-tf2.3/g" ./scripts/docker_build_ml.sh
./scripts/docker_build_ml.sh all
2 Likes

I guess this should ideally work. Thank you for clarity! It failed for me with the following error. But in case it did not have any errors, what would I expect as next step? This creates an image to run, right? So I guess I should run it with

sudo docker run …

Error I run into:

Cloning into ‘torchvision’…
Note: checking out ‘01dfa8ea81972bb74b52dc01e6a1b43b26b62020’.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b <new-branch-name>

Traceback (most recent call last):
  File "setup.py", line 12, in <module>
    import torch
  File "/usr/local/lib/python3.6/dist-packages/torch/__init__.py", line 195, in <module>
    _load_global_deps()
  File "/usr/local/lib/python3.6/dist-packages/torch/__init__.py", line 148, in _load_global_deps
    ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
  File "/usr/lib/python3.6/ctypes/__init__.py", line 348, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libcurand.so.10: cannot open shared object file: No such file or directory

The container test scripts also added into the repo.

You can test your image with:

./scripts/docker_test_ml.sh all

If there are any problems about packages into container, this script should find them.

Hi @mertnano, if you get that error while building container, make sure you have set your default docker runtime to nvidia (and reboot) - https://github.com/dusty-nv/jetson-containers#docker-default-runtime

1 Like

Thank you both for great help! @dusty_nv and @mehmetdeniz .

I was able to create my own image with the setting I asked for. After building I run the image successfully. One last newbie question is, which line can I change to set the custom name I want?
I am guessing the following, is that right?

sh ./scripts/docker_build.sh l4t-ml:r$L4T_VERSION-py3 Dockerfile.ml \

Best

I think, this script has some arguments left.

You can type like that (from docker_build_ml.sh):

sh ./scripts/docker_build.sh l4t-ml:r$L4T_VERSION-py3 Dockerfile.ml
–build-arg BASE_IMAGE=$BASE_IMAGE
–build-arg PYTORCH_IMAGE=l4t-pytorch:r$L4T_VERSION-pth1.7-py3
–build-arg TENSORFLOW_IMAGE=l4t-tensorflow:r$L4T_VERSION-tf2.3-py3
–build-arg L4T_APT_SOURCE=“deb https://repo.download.nvidia.com/jetson/common r32 main”

In building script, you can change your custom name instead of “l4t-ml:r$L4T_VERSION-py3”
or
You can change your current image name with “docker tag” script.

Best wishes

Hi @mehmetdeniz,
I was trind to do the same but, after some warning at stage 8/19 I get this error

The command ‘/bin/sh -c git clone --recursive -b ${TORCHAUDIO_VERSION} GitHub - pytorch/audio: Data manipulation and transformation for audio signal processing, powered by PyTorch torchaudio && cd torchaudio && python3 setup.py install && cd …/ && rm -rf torchaudio’ returned a non-zero code: 1

can You suggest me something, please?
Alessandro

Hi @acarbon
Did you get this message?

echo “done building PyTorch $pytorch_whl, torchvision $vision_version ($pillow_version), torchaudio $audio_version”

hi @mehmetdeniz, no I didn’t

Hi @acarbon, did you first set your docker default-runtime to nvidia (and reboot)? https://github.com/dusty-nv/jetson-containers#docker-default-runtime

If so, are you able to build the container before you made modifications to the script? If you clone a fresh copy of jetson-containers repo, can you run this?

$ git clone https://github.com/dusty-nv/jetson-containers jetson-containers-original
$ cd jetson-containers-original
$ ./scripts/docker_build_ml.sh pytorch 

hi, Yes, the default runtime is right, and in fact I can’t run even the standard container,
I get lots of similar errors
Skipping link https://files.pythonhosted.org/packages/36/06/1feea5c3fdcced8847f3a80c9a912cc065bcdafc1cb3e34d63f21391950d/numpy-1.16.3-cp27-cp27m-win32.whl#sha256=315fa1b1dfc16ae0f03f8fd1c55f23fd15368710f641d570236f3d78af55e340 (from Links for numpy) (requires-python:>=2.7,!=3.0.,!=3.1.,!=3.2.,!=3.3.); it is not compatible with this Python

it seems something wrong in python version
thank’s
Alessandro

That is normal message when pip3 install --verbose is used. It will eventually find the right package.

What is the actual error that causes it to terminate compilation? Can you attach the whole log?

hi @dusty_nv, in attachment the whole log.
tank’s
putty.log (957.9 KB)

OK, so this is the error from your log:

FAILED: third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/feature-functions.cc.o 
/usr/bin/c++   -I../../third_party/kaldi/src -I../../third_party/kaldi/submodule/src -isystem /usr/local/lib/python3.6/dist-packages/torch/include -isystem /usr/local/lib/python3.6/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda-10.2/include -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility=hidden -O3 -DNDEBUG -fPIC   -D_GLIBCXX_USE_CXX11_ABI=1 -std=gnu++14 -MD -MT third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/feature-functions.cc.o -MF third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/feature-functions.cc.o.d -o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/feature-functions.cc.o -c ../../third_party/kaldi/submodule/src/feat/feature-functions.cc
c++: internal compiler error: Killed (program cc1plus)

It seems to indicate that the compiler was killed by Linux, presumably due to low memory situation. Which Jetson are you building this on? If you check dmesg, do you see any messages about out of memory or OOM?

Can you keep an eye on your memory usage (via tegrastats) while this is running? I suggest disabling ZRAM and mounting disk swap, as shown here: https://github.com/dusty-nv/jetson-inference/blob/master/docs/pytorch-transfer-learning.md#mounting-swap

I’m using the Jetson nano 4 gb… here the out of memory message

55512.297924] Out of memory: Kill process 14497 (cc1plus) score 161 or sacrifice child
[55512.352939] Killed process 14497 (cc1plus) total-vm:1030008kB, anon-rss:681844kB, file-rss:0kB, shmem-rss:0kB
[55512.736787] oom_reaper: reaped process 14497 (cc1plus), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[55580.388286] docker0: port 1(veth2a20320) entered disabled state

I’ll try to mount a disk swap.
thank’s

I am using jetson nano 2GB Develop ment kit with Jept451 image. I have executed /scripts/docker_test_ml.sh all and I received the following error:

L4T BSP Version: L4T R32.5.1
testing container l4t-pytorch:r32.5.1-pth1.8-py3 => PyTorch
xhost: unable to open display “”
Unable to find image ‘l4t-pytorch:r32.5.1-pth1.8-py3’ locally
docker: Error response from daemon: pull access denied for l4t-pytorch, repository does not exist or may require ‘docker login’: denied: requested access to the resource is denied.
See ‘docker run --help’.

Can I install the jetson container into Jetson nano 2GB ??? to do this step spent more than 12 hours

  #build_pytorch "https://nvidia.box.com/shared/static/lufbgr3xu2uha40cs9ryq1zn4kxsnogl.whl" \
        #                         "torch-1.2.0-cp36-cp36m-linux_aarch64.whl" \
        #                         "l4t-pytorch:r$L4T_VERSION-pth1.2-py3" \
        #                         "v0.4.0" \
        #                         "pillow<7"

Hello @flogarcia999

You can type xhost + command before starting docker.

You can install from this catalog Data Science, Machine Learning, AI, HPC Containers | NVIDIA NGC

If you want to run that test script without building the containers locally, edit the script so that nvcr.io/nvidia/ is included before the container tags here:

https://github.com/dusty-nv/jetson-containers/blob/1e10908a104494a883f6855d1e9947827f2a17bc/scripts/docker_test_ml.sh#L164

Like this:

test_pytorch_all "nvcr.io/nvidia/l4t-pytorch:r$L4T_VERSION-pth1.8-py3"
test_tensorflow_all "nvcr.io/nvidia/l4t-tensorflow:r$L4T_VERSION-tf1.15-py3"
test_tensorflow_all "nvcr.io/nvidia/l4t-tensorflow:r$L4T_VERSION-tf2.3-py3"
test_all "nvcr.io/nvidia/l4t-ml:r$L4T_VERSION-py3"

That script just runs the tests of the containers. The PyTorch tests take awhile because it runs a bunch of models and verifies their accuracy. If you just want to run the container, see the l4t-pytorch page on NGC for the docker run command.