Docker run error - "exec format error"

Hi,
First time using docker commands so i might be making some silly mistake, apologies in advance.

  • My goal is to train a custom dataset, pretrained Retinanet model and integrate a python deepstream example + the model into a different python project.

Running on a Jetson Xavier AGX, i ran the following commands, based on this guide:
https://ngc.nvidia.com/catalog/models/nvidia:tlt_pretrained_object_detection

  1. In the terminal:

sudo docker pull nvcr.io/nvidia/tlt-streamanalytics:v2.0_dp_py2

I get the following output, which i guess is okay?

v2.0_dp_py2: Pulling from nvidia/tlt-streamanalytics
35b42117c431: Pulling fs layer
ad9c569a8d98: Pulling fs layer
293b44f45162: Pulling fs layer
0c175077525d: Pulling fs layer
c4959261975d: Pulling fs layer
10a8d097f872: Pulling fs layer
09f9eb0153c1: Pulling fs layer
c4959261975d: Waiting
defdb47b3acf: Pulling fs layer
23f2552ae755: Pulling fs layer
bb4ee296ceef: Pulling fs layer
09f9eb0153c1: Waiting
657bd07b9110: Waiting
defdb47b3acf: Waiting
0c175077525d: Waiting
f629a5fe035f: Waiting
cd5365dad468: Waiting
c1b74e2a4365: Waiting
60eabb0b41b1: Waiting
b829ee88df69: Waiting
ef3e9f5b312b: Waiting
dff865f05f3f: Waiting
b31c327e02e3: Waiting
7c92f6d00688: Waiting
0bd8011fe576: Waiting
468e9fe0d06a: Pull complete
03886f8054a5: Pull complete
e1f0e179055e: Pull complete
1193529cf5f1: Pull complete
6296e43681bf: Pull complete
ac953370fa22: Pull complete
396410a0232f: Pull complete
9d856fbb9e3f: Pull complete
36e46f9abfd3: Pull complete
349e60d1f9f9: Pull complete
4c244cd536e8: Pull complete
cf887db54426: Pull complete
8d9b292b90b5: Pull complete
1930919e01bc: Pull complete
a7132c6fc4a1: Pull complete
a3e2fbcdfa08: Pull complete
699d79372f72: Pull complete
d671eedbdc57: Pull complete
a4561f9c50e5: Pull complete
20aa2d4d54c3: Pull complete
cff4da427add: Pull complete
9b1a6edb498c: Pull complete
fe1269800bf1: Pull complete
Digest: sha256:71e4ce86029f19a2777409e4e3c6f5cc2d60d2b43235e4fe7d9e5c94a7a28aef
Status: Downloaded newer image for nvcr.io/nvidia/tlt-streamanalytics:v2.0_dp_py2
nvcr.io/nvidia/tlt-streamanalytics:v2.0_dp_py2

  1. I try to run

docker run --runtime=nvidia -it -v ā€œ/path/to/dir/on/hostā€:ā€œ/path/to/dir/in/dockerā€
-p 8888:8888 nvcr.io/nvidia/tlt-streamanalytics:v2.0_dp_py2 /bin/bash

The path/to/dir/on/host i changed to the path where i pulled the docker image to (altering the daemon.json fileā€™s data-root), the path/to/dir/docker i changed to a newly created folder by the name of workspace:

sudo docker run --runtime=nvidia -it -v ā€œ/mnt/XavierSSG500/docker/ā€:ā€œ/mnt/XavierSSG500/docker/workspace/ā€ -p 8888:8888 nvcr.io/nvidia/tlt-streamanalytics:v2.0_dp_py2 /bin/bash

which produces the following error:

standard_init_linux.go:211: exec user process caused ā€œexec format errorā€

*Google has come up with stuff to do with arm etcā€¦ But couldnā€™t find this specific use-case error.

Any assistance you provide will be greatly appreciated.

  1. Please use "$ docker images |grep tlt " to check if the tlt docker is downloaded
  2. Please refer to tlt user guide Integrating TAO Models into DeepStream ā€” TAO Toolkit 3.22.05 documentation, see below command
    $ docker run --runtime=nvidia -it -v /home/username/tlt-experiments:/workspace/tlt-experiments nvcr.io/nvidia/tlt-streamanalytics:v2.0_dp_py2 /bin/bash

Thank you for your reply, however, iā€™m still getting the same exact error for any of the commands listed in the link you sent, never mind what directory i change it to.
The grep produced the following:

nvcr.io/nvidia/tlt-streamanalytics v2.0_dp_py2 496dcdfc093a 7 weeks ago 7.99GB

Iā€™ll try the process on a new Jetson Xavier AGX anyway.

Can you run below successfully?
$ docker run --runtime=nvidia -it nvcr.io/nvidia/tlt-streamanalytics:v2.0_dp_py2 /bin/bash

Nope, same exact error:

standard_init_linux.go:211: exec user process caused ā€œexec format errorā€

Please double check if you meet

Software Requirements

Ubuntu 18.04 LTS
NVIDIA GPU Cloud account and API key - https://ngc.nvidia.com/
docker-ce installed, https://docs.docker.com/install/linux/docker-ce/ubuntu/
nvidia-docker2 installed, instructions: https://github.com/nvidia/nvidia-docker/wiki/Installation-(version-2.0)
NVIDIA GPU driver v410.xx or above

https://docs.nvidia.com/metropolis/TLT/tlt-getting-started-guide/index.html#requirements

More, the docker should run on your host PC instead of Xavier.

Thatā€™s a bit of a problem, canā€™t i use the TLT just on the Xavier? without docker maybe?

Normally, users run training via TLT in host PC. Then use the output file(etlt model) to do inference in Xavier. Or copy etlt model into Xavier and then generate the trt engine directly in Xavier, then do inference.

2 Likes

I see, but the training requires GPU, so iā€™d have to connect the host to the target device, will it automatically utilize the deviceā€™s GPU?

Normally, host PC will have a GPU. Please see the requirement.

2. Transfer Learning Toolkit Requirements

Using the Transfer Learning Toolkit requires the following:

Hardware Requirements

Minimum

  • 4 GB system RAM
  • 4 GB of GPU RAM
  • Single core CPU
  • 1 GPU
  • 50 GB of HDD space

Thank you, iā€™ll try it.

So unfortunately i donā€™t have the necessary host machine yet ( Intel GPU :/ )
So, is there a way to run the TLT on a AWS EC2 instance? Just wondering if itā€™s possible before i try and get stuck on some error for hours on end.

TLT team does not verify it in AWS EC2 instance. So it is unknown for us.
You can check if it meets the HW/SW requirement of TLT. Or you can run a quick try.

$ sudo docker run ā€“-runtime=nvidia -it -v /home/vaaan/tlt-experiments:/workspace/tlt-experiments -p 8888:8888 nvcr.io/nvidia/tlt-streamanalytics:v2.0_dp_py2/bin/bash

docker: invalid reference format.
See ā€˜docker run --helpā€™.

i am getting this error can you help,
These are my installed dependencies

  • Ubuntu 18.04
  • Driver version =455.23.05
  • Docker-ce =20.10.8
  • Nvidia-docker2
  • Docker-API 1.41

Thank you

@ebin.mathew
Please create a new forum topic. Thanks.

Thank you will do