Fine-tuning Peoplenet Resnet 34 on AWS. "failed to connect to vfs socket"

liam.AI · September 28, 2023, 3:20pm

• Hardware: AWS EC2 g4dn.xlarge
• Network Type: peoplenet_vtrainable_v2.5 resnet34_peoplenet.tlt
• TLT Version: TAO version 5
• Training spec file
peoplenet34_heads.txt (3.1 KB)

• How to reproduce the issue:

Run with python 3.8 in jupyter notebook

tao model detectnet_v2 train -e $SPECS_DIR/peoplenet34_heads.txt \ -r $USER_EXPERIMENT_DIR/experiment_dir_unpruned \ -n resnet18_detector \ --gpus $NUM_GPUS \ -k tlt_encode

Error message:

2023-09-28 15:04:23,438 [TAO Toolkit] [INFO] tensorflow 692: global_step/sec: 1.78093
2023-09-28 15:04:26,926 [TAO Toolkit] [INFO] nvidia_tao_tf1.core.hooks.sample_counter_hook 76: Train Samples / sec: 7.085
INFO:tensorflow:epoch = 0.9122137404580152, learning_rate = 0.00049999997, loss = 0.013414521, step = 478 (5.830 sec)
2023-09-28 15:04:29,266 [TAO Toolkit] [INFO] tensorflow 260: epoch = 0.9122137404580152, learning_rate = 0.00049999997, loss = 0.013414521, step = 478 (5.830 sec)
INFO:tensorflow:epoch = 0.933206106870229, learning_rate = 0.00049999997, loss = 0.016209295, step = 489 (6.025 sec)
2023-09-28 15:04:35,291 [TAO Toolkit] [INFO] tensorflow 260: epoch = 0.933206106870229, learning_rate = 0.00049999997, loss = 0.016209295, step = 489 (6.025 sec)
INFO:tensorflow:epoch = 0.9522900763358778, learning_rate = 0.00049999997, loss = 0.014432838, step = 499 (5.672 sec)
2023-09-28 15:04:40,964 [TAO Toolkit] [INFO] tensorflow 260: epoch = 0.9522900763358778, learning_rate = 0.00049999997, loss = 0.014432838, step = 499 (5.672 sec)
2023-09-28 15:04:40,964 [TAO Toolkit] [INFO] nvidia_tao_tf1.core.hooks.sample_counter_hook 76: Train Samples / sec: 7.124
INFO:tensorflow:epoch = 0.9713740458015266, learning_rate = 0.00049999997, loss = 0.014740716, step = 509 (5.692 sec)
2023-09-28 15:04:46,656 [TAO Toolkit] [INFO] tensorflow 260: epoch = 0.9713740458015266, learning_rate = 0.00049999997, loss = 0.014740716, step = 509 (5.692 sec)
INFO:tensorflow:epoch = 0.9904580152671756, learning_rate = 0.00049999997, loss = 0.016124992, step = 519 (5.687 sec)
2023-09-28 15:04:52,343 [TAO Toolkit] [INFO] tensorflow 260: epoch = 0.9904580152671756, learning_rate = 0.00049999997, loss = 0.016124992, step = 519 (5.687 sec)
INFO:tensorflow:global_step/sec: 1.76236
2023-09-28 15:04:52,943 [TAO Toolkit] [INFO] tensorflow 692: global_step/sec: 1.76236
[1695913495.912078] [0ac105827284:216  :f]        vfs_fuse.c:424  UCX  WARN  failed to connect to vfs socket '': Invalid argument
2023-09-28 15:04:56,003 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.evaluation.evaluation 130: step 0 / 58, 0.00s/step
Execution status: FAIL
2023-09-28 15:05:08,039 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 337: Stopping container.

Thank you to anyone that can be of any help :)

Morganh · September 29, 2023, 2:52pm

liam.AI:

[1695913495.912078] [0ac105827284:216  :f]        vfs_fuse.c:424  UCX  WARN  failed to connect to vfs socket '': Invalid argument
2023-09-28 15:04:56,003 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.evaluation.evaluation 130: step 0 / 58, 0.00s/step
Execution status: FAIL

The “failed to connect to vfs socket” is not the reason. It is failed while running evaluation.
To narrow down, please change

validation_fold: 0

to below and retry.

validation_data_source: {
    tfrecords_path: "/workspace/tao-experiments/HFDData/tfrecordsOLD/coco_trainval/*"
    image_directory_path: "/workspace/tao-experiments/HFDData/HeadImages"
}

liam.AI · October 2, 2023, 9:19am

Hi Morganh,

I’ve made that change but i’ve got the same result as before:

INFO:tensorflow:epoch = 0.9725557461406518, learning_rate = 0.00049999997, loss = 0.014038825, step = 567 (5.926 sec)
2023-10-02 09:08:50,604 [TAO Toolkit] [INFO] tensorflow 260: epoch = 0.9725557461406518, learning_rate = 0.00049999997, loss = 0.014038825, step = 567 (5.926 sec)
2023-10-02 09:08:54,839 [TAO Toolkit] [INFO] nvidia_tao_tf1.core.hooks.sample_counter_hook 76: Train Samples / sec: 6.798
INFO:tensorflow:epoch = 0.9897084048027444, learning_rate = 0.00049999997, loss = 0.015056454, step = 577 (5.914 sec)
2023-10-02 09:08:56,518 [TAO Toolkit] [INFO] tensorflow 260: epoch = 0.9897084048027444, learning_rate = 0.00049999997, loss = 0.015056454, step = 577 (5.914 sec)
INFO:tensorflow:global_step/sec: 1.71379
2023-10-02 09:08:58,275 [TAO Toolkit] [INFO] tensorflow 692: global_step/sec: 1.71379
[1696237740.724623] [5af009ef5b65:217  :f]        vfs_fuse.c:424  UCX  WARN  failed to connect to vfs socket '': Invalid argument
2023-10-02 09:09:00,810 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.evaluation.evaluation 130: step 0 / 582, 0.00s/step
Execution status: FAIL
2023-10-02 09:09:12,877 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 337: Stopping container.

Do you have any other suggestions?

Thanks for your help!

Morganh · October 3, 2023, 3:46am

To narrow down, could you run in a new terminal instead of notebook?
Step:

Open a new terminal
$ tao model detectnet_v2 run /bin/bash
Then inside the docker, run the command.
# detectnet_v2 train -e $SPECS_DIR/peoplenet34_heads.txt \ -r $USER_EXPERIMENT_DIR/experiment_dir_unpruned \ -n resnet18_detector \ –gpus $NUM_GPUS \ -k tlt_encode

liam.AI · October 5, 2023, 11:39am

Thanks @Morganh. Just tried this and git the following:

(base) ubuntu@ip-172-31-15-235:~/liamd_HFD/HFD$ docker run -it nvcr.io/nvidia/tao/tao-toolkit:5.0.0-tf1.15.5

=======================
=== TAO Toolkit TF1 ===
=======================

NVIDIA Release 5.0.0-TF1 (build 52693369)
TAO Toolkit Version 5.0.0

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the TAO Toolkit End User License Agreement.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/tao-toolkit-software-license-agreement

WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
   Use the NVIDIA Container Toolkit to start this container with GPU support; see
   https://docs.nvidia.com/datacenter/cloud-native/ .

NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
   insufficient for TAO Toolkit.  NVIDIA recommends the use of the following flags:
   docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 ...

root@6eaddc16b73a:/workspace# 
root@6eaddc16b73a:/workspace# detectnet_v2 train -e $SPECS_DIR/peoplenet34_heads.txt -r $USER_EXPERIMENT_DIR/experiment_dir_unpruned-n resnet18_detector –gpus 1
2023-10-05 10:51:21.681482: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcudart.so.12
2023-10-05 10:51:22,032 [TAO Toolkit] [WARNING] tensorflow 40: Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
2023-10-05 10:51:26,802 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable  TF_ALLOW_IOLIBS=1.
2023-10-05 10:51:26,940 [TAO Toolkit] [WARNING] tensorflow 42: TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable  TF_ALLOW_IOLIBS=1.
2023-10-05 10:51:26,961 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable  TF_ALLOW_IOLIBS=1.
Traceback (most recent call last):
  File "/usr/local/bin/detectnet_v2", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/entrypoint/detectnet_v2.py", line 12, in main
    launch_job(nvidia_tao_tf1.cv.detectnet_v2.scripts, "detectnet_v2", sys.argv[1:])
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/common/entrypoint/entrypoint.py", line 276, in launch_job
    modules = get_modules(package)
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/common/entrypoint/entrypoint.py", line 47, in get_modules
    module = importlib.import_module(module_name)
  File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 848, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/scripts/inference.py", line 19, in <module>
    from nvidia_tao_tf1.cv.detectnet_v2.inferencer.build_inferencer import build_inferencer
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/inferencer/build_inferencer.py", line 24, in <module>
    from nvidia_tao_tf1.cv.detectnet_v2.inferencer.trt_inferencer import DEFAULT_MAX_WORKSPACE_SIZE
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/inferencer/trt_inferencer.py", line 32, in <module>
    import pycuda.autoinit # noqa pylint: disable=unused-import
  File "/usr/local/lib/python3.8/dist-packages/pycuda/autoinit.py", line 1, in <module>
    import pycuda.driver as cuda
  File "/usr/local/lib/python3.8/dist-packages/pycuda/driver.py", line 66, in <module>
    from pycuda._driver import *  # noqa
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory
root@6eaddc16b73a:/workspace#

Morganh · October 5, 2023, 3:04pm

liam.AI:

WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
   Use the NVIDIA Container Toolkit to start this container with GPU support; see
   https://docs.nvidia.com/datacenter/cloud-native/ .

Please install nvidia driver.
$ sudo apt install nvidia-driver-525
$ sudo reboot

And try previous steps again.

liam.AI · October 5, 2023, 7:57pm

I ran the above successfully outside the container and then ran

tao model detectnet_v2 run /bin/bash

and I see the following error:

Morganh · October 6, 2023, 3:54am

Can you run $nvidia-smi and share the result?

Then, please
$ sudo apt purge nvidia* libnvidia*
$ sudo apt install nvidia-driver-525 nvidia-container-toolkit

liam.AI · October 6, 2023, 8:28am

Fri Oct  6 08:27:23 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.125.06   Driver Version: 525.125.06   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:1E.0 Off |                    0 |
| N/A   38C    P0    26W /  70W |      0MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Morganh · October 6, 2023, 10:22am

Please
$ sudo apt purge nvidia* libnvidia*
$ sudo apt install nvidia-driver-525 nvidia-container-toolkit

liam.AI · October 6, 2023, 3:16pm

Ok I’ve done that and then executed steps 2 and 3 from above again

(launcher3.8) (base) ubuntu@ip-172-31-15-235:~/liamd_HFD/HFD$ docker run -it nvcr.io/nvidia/tao/tao-toolkit:5.0.0-tf1.15.5

=======================
=== TAO Toolkit TF1 ===
=======================

NVIDIA Release 5.0.0-TF1 (build 52693369)
TAO Toolkit Version 5.0.0

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the TAO Toolkit End User License Agreement.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/tao-toolkit-software-license-agreement

WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
   Use the NVIDIA Container Toolkit to start this container with GPU support; see
   https://docs.nvidia.com/datacenter/cloud-native/ .

NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
   insufficient for TAO Toolkit.  NVIDIA recommends the use of the following flags:
   docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 ...

root@bd23a4b37042:/workspace# detectnet_v2 train -e $SPECS_DIR/peoplenet34_heads.txt -r $USER_EXPERIMENT_DIR/experi
ment_dir_unpruned -n resnet18_detector –gpus 1
2023-10-06 15:14:16.946714: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcudart.so.12
2023-10-06 15:14:16,998 [TAO Toolkit] [WARNING] tensorflow 40: Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
2023-10-06 15:14:18,606 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable  TF_ALLOW_IOLIBS=1.
2023-10-06 15:14:18,647 [TAO Toolkit] [WARNING] tensorflow 42: TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable  TF_ALLOW_IOLIBS=1.
2023-10-06 15:14:18,651 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable  TF_ALLOW_IOLIBS=1.
Traceback (most recent call last):
  File "/usr/local/bin/detectnet_v2", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/entrypoint/detectnet_v2.py", line 12, in main
    launch_job(nvidia_tao_tf1.cv.detectnet_v2.scripts, "detectnet_v2", sys.argv[1:])
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/common/entrypoint/entrypoint.py", line 276, in launch_job
    modules = get_modules(package)
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/common/entrypoint/entrypoint.py", line 47, in get_modules
    module = importlib.import_module(module_name)
  File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 848, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/scripts/inference.py", line 19, in <module>
    from nvidia_tao_tf1.cv.detectnet_v2.inferencer.build_inferencer import build_inferencer
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/inferencer/build_inferencer.py", line 24, in <module>
    from nvidia_tao_tf1.cv.detectnet_v2.inferencer.trt_inferencer import DEFAULT_MAX_WORKSPACE_SIZE
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/inferencer/trt_inferencer.py", line 32, in <module>
    import pycuda.autoinit # noqa pylint: disable=unused-import
  File "/usr/local/lib/python3.8/dist-packages/pycuda/autoinit.py", line 1, in <module>
    import pycuda.driver as cuda
  File "/usr/local/lib/python3.8

Morganh · October 6, 2023, 3:22pm

Please use below.
$ docker run --runtime=nvidia -it nvcr.io/nvidia/tao/tao-toolkit:5.0.0-tf1.15.5

liam.AI · October 6, 2023, 3:34pm

Ok I’ve tried again with $ docker run --runtime=nvidia -it nvcr.io/nvidia/tao/tao-toolkit:5.0.0-tf1.15.5

root@669cf0542c75:/workspace# detectnet_v2 train -e $SPECS_DIR/peoplenet34_heads.txt -r $USER_EXPERIMENT_DIR/experiment_dir_unpruned -n resnet18_detector –gpus 1
2023-10-06 15:34:06.341459: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcudart.so.12
2023-10-06 15:34:06,393 [TAO Toolkit] [WARNING] tensorflow 40: Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
2023-10-06 15:34:07,912 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable  TF_ALLOW_IOLIBS=1.
2023-10-06 15:34:07,950 [TAO Toolkit] [WARNING] tensorflow 42: TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable  TF_ALLOW_IOLIBS=1.
2023-10-06 15:34:07,953 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable  TF_ALLOW_IOLIBS=1.
usage: detectnet_v2 train [-h] [--num_processes NUM_PROCESSES] [--gpus GPUS] [--gpu_index GPU_INDEX [GPU_INDEX ...]]
                          [--use_amp] [--log_file LOG_FILE] [-e EXPERIMENT_SPEC_FILE] [-r RESULTS_DIR] [-n MODEL_NAME]
                          [-v] [-k KEY] [--enable_determinism]
                          {train,prune,inference,export,evaluate,dataset_convert,calibration_tensorfile} ...
detectnet_v2 train: error: argument /tasks: invalid choice: '–gpus' (choose from 'train', 'prune', 'inference', 'export', 'evaluate', 'dataset_convert', 'calibration_tensorfile')
root@669cf0542c75:/workspace#

Morganh · October 6, 2023, 3:37pm

Please set to
--gpus 1
or not set.

liam.AI · October 6, 2023, 3:41pm

Ok added the extra - :

root@669cf0542c75:/workspace# detectnet_v2 train -e $SPECS_DIR/peoplenet34_heads.txt -r $USER_EXPERIMENT_DIR/experiment_dir_unpruned -n resnet18_detector -–gpus 1
2023-10-06 15:41:14.161529: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcudart.so.12
2023-10-06 15:41:14,211 [TAO Toolkit] [WARNING] tensorflow 40: Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
2023-10-06 15:41:15,669 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable  TF_ALLOW_IOLIBS=1.
2023-10-06 15:41:15,705 [TAO Toolkit] [WARNING] tensorflow 42: TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable  TF_ALLOW_IOLIBS=1.
2023-10-06 15:41:15,708 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable  TF_ALLOW_IOLIBS=1.
usage: detectnet_v2 train [-h] [--num_processes NUM_PROCESSES] [--gpus GPUS] [--gpu_index GPU_INDEX [GPU_INDEX ...]]
                          [--use_amp] [--log_file LOG_FILE] [-e EXPERIMENT_SPEC_FILE] [-r RESULTS_DIR] [-n MODEL_NAME]
                          [-v] [-k KEY] [--enable_determinism]
                          {train,prune,inference,export,evaluate,dataset_convert,calibration_tensorfile} ...
detectnet_v2 train: error: argument /tasks: invalid choice: '1' (choose from 'train', 'prune', 'inference', 'export', 'evaluate', 'dataset_convert', 'calibration_tensorfile')
root@669cf0542c75:/workspace#

Morganh · October 6, 2023, 3:50pm

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

Can you set explicit path and retry?

system · October 24, 2023, 3:03am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Tao Training Model Error TAO Toolkit	7	494	January 15, 2024
Failed to decode TrafficCamNet from etlt to ONNX TAO Toolkit	9	809	December 6, 2023
Detectnet2 TAO Toolkit model training fail on formating dataset on kitti format TAO Toolkit	69	958	January 22, 2024
Tao toolkit detectnet training kitty format error TAO Toolkit	10	414	December 8, 2023
Detectnet_v2 notebook stuck at tfrecords conversion step TAO Toolkit	17	51	October 30, 2024
Error while pruning .tlt model created during efficientdet-d0 model TAO Toolkit	19	128	July 24, 2024
Detectnetv2 tfrecords error TAO Toolkit	4	419	January 13, 2024
TAO 5.0 failed to train TAO Toolkit	8	539	August 1, 2023
No CUDA-capable device is detected on tao detectnet_v2 dataset convert TAO Toolkit pycuda , omniverse_extension	13	6145	January 4, 2022
Tao toolkit observations TAO Toolkit	56	915	May 29, 2024

Fine-tuning Peoplenet Resnet 34 on AWS. "failed to connect to vfs socket"

Related topics