Please Help! Problems accessing tao toolkit with docker Jetson

Please provide the following information when requesting support.

• Jetson Xavier AGX

• BPNET

Configuration of the TAO Toolkit Instance

dockers:
nvidia/tao/tao-toolkit-tf:
v3.22.05-tf1.15.5-py3:
docker_registry: nvcr.io
tasks:
1. augment
2. bpnet
3. classification
4. dssd
5. faster_rcnn
6. emotionnet
7. efficientdet
8. fpenet
9. gazenet
10. gesturenet
11. heartratenet
12. lprnet
13. mask_rcnn
14. multitask_classification
15. retinanet
16. ssd
17. unet
18. yolo_v3
19. yolo_v4
20. yolo_v4_tiny
21. converter
v3.22.05-tf1.15.4-py3:
docker_registry: nvcr.io
tasks:
1. detectnet_v2
nvidia/tao/tao-toolkit-pyt:
v3.22.05-py3:
docker_registry: nvcr.io
tasks:
1. speech_to_text
2. speech_to_text_citrinet
3. speech_to_text_conformer
4. action_recognition
5. pointpillars
6. pose_classification
7. spectro_gen
8. vocoder
v3.21.11-py3:
docker_registry: nvcr.io
tasks:
1. text_classification
2. question_answering
3. token_classification
4. intent_slot_classification
5. punctuation_and_capitalization
nvidia/tao/tao-toolkit-lm:
v3.22.05-py3:
docker_registry: nvcr.io
tasks:
1. n_gram
format_version: 2.0
toolkit_version: 3.22.05
published_date: 05/25/2022

"Issue:

(launcher) jetson@ubuntu:~$ tao bpnet --help
2022-10-26 20:24:22,096 [INFO] root: Registry: [‘nvcr.io’]
2022-10-26 20:24:22,337 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.22.05-tf1.15.5-py3
Docker instantiation failed with error: 500 Server Error: Internal Server Error (“failed to create shim: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as ‘csv’
invoking the NVIDIA Container Runtime Hook directly (e.g. specifying the docker --gpus flag) is not supported. Please use the NVIDIA Container Runtime instead.: unknown”)

I stuck here for two weeks now, installed Xavier 3 times with SDK from local host.

Who can give me a hint?

thx!

Hi @rwakker, the tao-toolkit container is for x86. The model training occurs on x86 using TAO, and you can deploy the models trained with TAO to Jetson using DeepStream or Triton Inference Server. For DeepStream on Jetson, you can run the deepstream-l4t container:

The TAO is designed to run on x86 systems with an NVIDIA GPU (e.g., GPU-powered workstation, DGX system, etc) or can be run in any cloud with an NVIDIA GPU. For inference, models can be deployed on any edge device such as an embedded Jetson platform or in a data center with GPUs like T4 or A100, etc.

hi, thx very much for the reply. Im not there yet :(

I like to start learning human pose estimations on swimmers with limited background in AI, train models etc…

Swimmer example:

Im using this hardware:

I started to train according following example:

But finally stuck in the training part of the provided jupyter notebook

!tao bpnet train -e $SPECS_DIR/bpnet_train_m1_coco.yaml
-r $USER_EXPERIMENT_DIR/models/exp_m1_unpruned
-k $KEY
–gpus $NUM_GPUS

Giving error:
2022-10-31 20:32:00,786 [INFO] root: Registry: [‘nvcr.io’]
2022-10-31 20:32:00,989 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.22.05-tf1.15.5-py3
Docker instantiation failed with error: 500 Server Error: Internal Server Error (“failed to create shim: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as ‘csv’
invoking the NVIDIA Container Runtime Hook directly (e.g. specifying the docker --gpus flag) is not supported. Please use the NVIDIA Container Runtime instead.: unknown”)

I really like to train my own images, swimmers, where I like to annotate based on keypoints of the partly visible images (swimmers are only seen from top, front view or side view) Arms are lifted up above the water and not visible in the image, etc…

I gave up and now I tried:

This works partly, I need a method to train my data without in-depth knowledge about setting up the complete pipeline,

Any suggestion to continue my research?

Which device did you run the training? In dgpu or Jetson device?

I think

Configuration of the TAO Toolkit Instance
dockers: [‘nvidia/tao/tao-toolkit-tf’, ‘nvidia/tao/tao-toolkit-pyt’, ‘nvidia/tao/tao-toolkit-lm’]
format_version: 2.0
toolkit_version: 3.22.05
published_date: 05/25/2022
I managed to continue a big step, we can close the topic, it was about the l4t deepstream docker

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one.
Thanks

OK. For training, please run TAO training on x86 systems with dgpu. Cannot run training in Jetson device.
For inference, dgpu or Jetson device is fine.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.