Training Peoplent on custom data

@Morganh what are the steps to train peoplenet ? i used the following https://developer.nvidia.com/blog/training-custom-pretrained-models-using-tlt/

in which i am not able to pull the docker image nor able to get the “tlt-dataset-convert” exe offline , i am trying to train on GPU hardware sys : MX130 just to test the training and its results

Moving into Transfer Learning Toolkit forum for resolution.

1 Like

@abhigoku10
The blog https://developer.nvidia.com/blog/training-custom-pretrained-models-using-tlt/ was released last year. It is running with TLT 2.0.
Currently, TLT3.0-py3 is released. So, some steps are not the same as TLT 2.0. See NVIDIA TAO Documentation

But it should be no issue when you run below to pull TLT 2.0 docker.
docker pull nvcr.io/nvidia/tlt-streamanalytics:v2.0_dp_py3

Can you share the log about “i am not able to pull the docker image” ?

Thanks for the response i was able to setup the docker there were few issues in my setup
Approach 1. where i am trying to train peoplenet using juypyter notebook from "https://ngc.nvidia.com/catalog/resources/nvidia:tlt_cv_samples " , i an terminal i login to “sudo docker login nvcr.io” successfuly and in the same terminal when i run the jupyter notebook i get the following error
““No file found at: {}. Did you run docker login?”.format(config_path)
AssertionError: Config path must be a valid unix path. No file found at: /home/abhilash.sk/.docker/config.json. Did you run docker login” when ran the !tlt detectnet_v2 --help or any other command … can you let meknw what i am doing wrong here in the process

Approach2 : I followed the https://developer.nvidia.com/blog/training-custom-pretrained-models-using-tlt/ and pulled in the image which you suggested but getting the following error “sudo docker pull nvcr.io/nvidia/tlt-streamanalytics:v2.0_dp_py3
Error response from daemon: manifest for nvcr.io/nvidia/tlt-streamanalytics:v2.0_dp_py3 not found” but i am able to pull the "docker pull nvcr.io/nvidia/tlt-streamanalytics:v3.0-py3 " and when i run “tlt-dataset-convert -d $conversion_spec_file_trainval -o $tfrecord_path_trainval” i am getting command not found error . Since i am using the TLT3.0 version i had changed the command to “tlt detectnet_v2 dataset-convert” then also i get teh following error ““No file found at: {}. Did you run docker login?”.format(config_path)
AssertionError: Config path must be a valid unix path. No file found at: /home/abhilash.sk/.docker/config.json. Did you run docker login?”
Can you let meknw how to move forward for the two approaches , i am running on hardware system of MX130 ge with cuda 11.1

Can you try
$ docker login nvcr.io

Then, please see TLT Launcher — Transfer Learning Toolkit 3.0 documentation

Once you have installed docker-ce, please follow the post-installation steps to make sure that the docker can be run without sudo.

Reference: Not able to launch TLT3 training - #5 by aurointelli

hello
$ docker login nvcr.io currently is working for me

i have question on the process

  1. i finished the prerequisites only thing is i did not create a virtual environment since i have only to run this training part in this system
  2. when i run the command tlt detectnet_v2 --help
    " raise host_config_version_error(‘device_requests’, ‘1.40’)
    docker.errors.InvalidVersion: device_requests param is not supported in API versions < 1.40" which says that docker api version is less but for my system below are the specs

my docker version are
Client:
Version: 18.03.1-ce
API version: 1.37
Go version: go1.9.5
Git commit: 9ee9f40
Built: Wed Jun 20 21:43:51 2018
OS/Arch: linux/amd64
Experimental: false
Orchestrator: swarm

Server:
Engine:
Version: 18.03.1-ce
API version: 1.37 (minimum version 1.12)
Go version: go1.9.5
Git commit: 9ee9f40
Built: Wed Jun 20 21:42:00 2018
OS/Arch: linux/amd64
Experimental: false

when i check the availability of the of different version "
apt-cache madison nvidia-docker2 nvidia-container-runtime docker-ce
nvidia-docker2 | 2.6.0-1 | https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64 Packages
nvidia-docker2 | 2.5.0-1 | https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64 Packages
nvidia-docker2 | 2.4.0-1 | https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64 Packages
nvidia-docker2 | 2.3.0-1 | https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64 Packages
nvidia-docker2 | 2.2.2-1 | https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64 Packages
nvidia-docker2 | 2.2.1-1 | https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64 Packages
nvidia-docker2 | 2.2.0-1 | https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64 Packages
nvidia-docker2 | 2.1.1-1 | https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64 Packages
nvidia-docker2 | 2.1.0-1 | https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64 Packages
nvidia-docker2 | 2.0.3+docker18.09.7-3 | https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64 Packages
nvidia-docker2 | 2.0.3+docker18.09.6-3 | https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64 Packages
nvidia-docker2 | 2.0.3+docker18.09.5-3 | https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64 Packages
nvidia-docker2 | 2.0.3+docker18.09.5-2 | https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64 Packages
nvidia-docker2 | 2.0.3+docker18.09.4-1 | https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64 Packages
nvidia-docker2 | 2.0.3+docker18.09.3-1 | https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64 Packages
nvidia-docker2 | 2.0.3+docker18.09.2-1 | https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64 Packages
nvidia-docker2 | 2.0.3+docker18.09.1-1 | https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64 Packages
nvidia-docker2 | 2.0.3+docker18.09.0-1 | https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64 Packages
nvidia-docker2 | 2.0.3+docker18.06.3-3 | https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64 Packages
nvidia-docker2 | 2.0.3+docker18.06.2-2 | https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64 Packages
nvidia-docker2 | 2.0.3+docker18.06.2-1 | https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64 Packages
nvidia-docker2 | 2.0.3+docker18.06.1-1 | https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64 Packages
nvidia-docker2 | 2.0.3+docker18.06.0-1 | https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64 Packages
nvidia-docker2 | 2.0.3+docker18.03.1-1 | https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64 Packages
nvidia-docker2 | 2.0.3+docker17.12.1-1 | https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64 Packages
nvidia-container-runtime | 3.5.0-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64 Packages
nvidia-container-runtime | 3.4.2-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64 Packages
nvidia-container-runtime | 3.4.1-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64 Packages
nvidia-container-runtime | 3.4.0-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64 Packages
nvidia-container-runtime | 3.3.0-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64 Packages
nvidia-container-runtime | 3.2.0-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64 Packages
nvidia-container-runtime | 3.1.4-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64 Packages
nvidia-container-runtime | 3.1.3-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64 Packages
nvidia-container-runtime | 3.1.2-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64 Packages
nvidia-container-runtime | 3.1.1-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64 Packages
nvidia-container-runtime | 2.0.0+docker18.09.7-3 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64 Packages
nvidia-container-runtime | 2.0.0+docker18.09.6-3 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64 Packages
nvidia-container-runtime | 2.0.0+docker18.09.5-3 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64 Packages
nvidia-container-runtime | 2.0.0+docker18.09.5-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64 Packages
nvidia-container-runtime | 2.0.0+docker18.09.4-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64 Packages
nvidia-container-runtime | 2.0.0+docker18.09.3-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64 Packages
nvidia-container-runtime | 2.0.0+docker18.09.2-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64 Packages
nvidia-container-runtime | 2.0.0+docker18.09.1-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64 Packages
nvidia-container-runtime | 2.0.0+docker18.09.0-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64 Packages
nvidia-container-runtime | 2.0.0+docker18.06.3-3 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64 Packages
nvidia-container-runtime | 2.0.0+docker18.06.2-2 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64 Packages
nvidia-container-runtime | 2.0.0+docker18.06.2-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64 Packages
nvidia-container-runtime | 2.0.0+docker18.06.1-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64 Packages
nvidia-container-runtime | 2.0.0+docker18.06.0-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64 Packages
nvidia-container-runtime | 2.0.0+docker18.03.1-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64 Packages
nvidia-container-runtime | 2.0.0+docker17.12.1-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64 Packages
docker-ce | 5:20.10.7~3-0~ubuntu-bionic | https://download.docker.com/linux/ubuntu bionic/stable amd64 Packages
docker-ce | 5:20.10.6~3-0~ubuntu-bionic | https://download.docker.com/linux/ubuntu bionic/stable amd64 Packages
docker-ce | 5:20.10.5~3-0~ubuntu-bionic | https://download.docker.com/linux/ubuntu bionic/stable amd64 Packages
docker-ce | 5:20.10.4~3-0~ubuntu-bionic | https://download.docker.com/linux/ubuntu bionic/stable amd64 Packages
docker-ce | 5:20.10.3~3-0~ubuntu-bionic | https://download.docker.com/linux/ubuntu bionic/stable amd64 Packages
docker-ce | 5:20.10.2~3-0~ubuntu-bionic | https://download.docker.com/linux/ubuntu bionic/stable amd64 Packages
docker-ce | 5:20.10.1~3-0~ubuntu-bionic | https://download.docker.com/linux/ubuntu bionic/stable amd64 Packages
docker-ce | 5:20.10.0~3-0~ubuntu-bionic | https://download.docker.com/linux/ubuntu bionic/stable amd64 Packages
docker-ce | 5:19.03.15~3-0~ubuntu-bionic | https://download.docker.com/linux/ubuntu bionic/stable amd64 Packages
docker-ce | 5:19.03.14~3-0~ubuntu-bionic | https://download.docker.com/linux/ubuntu bionic/stable amd64 Packages
docker-ce | 5:19.03.13~3-0~ubuntu-bionic | https://download.docker.com/linux/ubuntu bionic/stable amd64 Packages
docker-ce | 5:19.03.12~3-0~ubuntu-bionic | https://download.docker.com/linux/ubuntu bionic/stable amd64 Packages
docker-ce | 5:19.03.11~3-0~ubuntu-bionic | https://download.docker.com/linux/ubuntu bionic/stable amd64 Packages
docker-ce | 5:19.03.10~3-0~ubuntu-bionic | https://download.docker.com/linux/ubuntu bionic/stable amd64 Packages
docker-ce | 5:19.03.9~3-0~ubuntu-bionic | https://download.docker.com/linux/ubuntu bionic/stable amd64 Packages
docker-ce | 5:19.03.8~3-0~ubuntu-bionic | https://download.docker.com/linux/ubuntu bionic/stable amd64 Packages
docker-ce | 5:19.03.7~3-0~ubuntu-bionic | https://download.docker.com/linux/ubuntu bionic/stable amd64 Packages
docker-ce | 5:19.03.6~3-0~ubuntu-bionic | https://download.docker.com/linux/ubuntu bionic/stable amd64 Packages
docker-ce | 5:19.03.5~3-0~ubuntu-bionic | https://download.docker.com/linux/ubuntu bionic/stable amd64 Packages
docker-ce | 5:19.03.4~3-0~ubuntu-bionic | https://download.docker.com/linux/ubuntu bionic/stable amd64 Packages
docker-ce | 5:19.03.3~3-0~ubuntu-bionic | https://download.docker.com/linux/ubuntu bionic/stable amd64 Packages
docker-ce | 5:19.03.2~3-0~ubuntu-bionic | https://download.docker.com/linux/ubuntu bionic/stable amd64 Packages
docker-ce | 5:19.03.1~3-0~ubuntu-bionic | https://download.docker.com/linux/ubuntu bionic/stable amd64 Packages
docker-ce | 5:19.03.0~3-0~ubuntu-bionic | https://download.docker.com/linux/ubuntu bionic/stable amd64 Packages
docker-ce | 5:18.09.9~3-0~ubuntu-bionic | https://download.docker.com/linux/ubuntu bionic/stable amd64 Packages
docker-ce | 5:18.09.8~3-0~ubuntu-bionic | https://download.docker.com/linux/ubuntu bionic/stable amd64 Packages
docker-ce | 5:18.09.7~3-0~ubuntu-bionic | https://download.docker.com/linux/ubuntu bionic/stable amd64 Packages
docker-ce | 5:18.09.6~3-0~ubuntu-bionic | https://download.docker.com/linux/ubuntu bionic/stable amd64 Packages
docker-ce | 5:18.09.5~3-0~ubuntu-bionic | https://download.docker.com/linux/ubuntu bionic/stable amd64 Packages
docker-ce | 5:18.09.4~3-0~ubuntu-bionic | https://download.docker.com/linux/ubuntu bionic/stable amd64 Packages
docker-ce | 5:18.09.3~3-0~ubuntu-bionic | https://download.docker.com/linux/ubuntu bionic/stable amd64 Packages
docker-ce | 5:18.09.2~3-0~ubuntu-bionic | https://download.docker.com/linux/ubuntu bionic/stable amd64 Packages
docker-ce | 5:18.09.1~3-0~ubuntu-bionic | https://download.docker.com/linux/ubuntu bionic/stable amd64 Packages
docker-ce | 5:18.09.0~3-0~ubuntu-bionic | https://download.docker.com/linux/ubuntu bionic/stable amd64 Packages
docker-ce | 18.06.3~ce~3-0~ubuntu | https://download.docker.com/linux/ubuntu bionic/stable amd64 Packages
docker-ce | 18.06.2~ce~3-0~ubuntu | https://download.docker.com/linux/ubuntu bionic/stable amd64 Packages
docker-ce | 18.06.1~ce~3-0~ubuntu | https://download.docker.com/linux/ubuntu bionic/stable amd64 Packages
docker-ce | 18.06.0~ce~3-0~ubuntu | https://download.docker.com/linux/ubuntu bionic/stable amd64 Packages
docker-ce | 18.03.1~ce~3-0~ubuntu | https://download.docker.com/linux/ubuntu bionic/stable amd64 Packages
"

Please upgrade docker verison.
See NVIDIA TAO Documentation
docker-ce >19.03.5

Thanks for sharing the steps i am able to work as mentioned

Approach 1 : conversion of tf_records steps in the following command “!tlt detectnet_v2 dataset_convert
-d $SPECS_DIR/detectnet_v2_tfrecords_kitti_trainval.txt
-o $DATA_DOWNLOAD_DIR/tfrecords/kitti_trainval/kitti_trainval”
is giving an error of "
ls: cannot access ‘/home/abhilash.sk/tlt_cv_samples_v1.1.0/detectnet_v2/data/tfrecords/kitti_trainval/’: No such file or directory" even though folder is there but its root protected not sure how that happened

I am passing the custom data to the generation

Please refer to NVIDIA TAO Documentation.

For TLT 3.0, it is suggested to create ~/tlt_mounts.json file.
It can map local directory o the docker.

In the command line, the path should be the “destination” path inside the docker.

There is also a simple way for reference. You can set paths to the same.
For example,

“source”: “/home/omno/Desktop/umair/tlt-samples/classification”,
“destination” : “/home/omno/Desktop/umair/tlt-samples/classification”

Hi thanks for the response ,

  1. what should be the minimum size of the custom images for training data since all the tf records what i generated where zero
  2. so i took kitti dataset of 297 images and generated the tfrecord and started the training but getting the following errror
    INFO:tensorflow:Graph was finalized.
    2021-07-02 08:26:05,444 [INFO] tensorflow: Graph was finalized.
    INFO:tensorflow:Running local_init_op.
    2021-07-02 08:26:07,065 [INFO] tensorflow: Running local_init_op.
    INFO:tensorflow:Done running local_init_op.
    2021-07-02 08:26:08,202 [INFO] tensorflow: Done running local_init_op.
    INFO:tensorflow:Saving checkpoints for step-0.
    2021-07-02 08:26:14,671 [INFO] tensorflow: Saving checkpoints for step-0.
    Traceback (most recent call last):
    File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py”, line 1365, in _do_call
    return fn(*args)
    File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py”, line 1350, in _run_fn
    target_list, run_metadata)
    File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py”, line 1443, in _call_tf_sessionrun
    run_metadata)
    tensorflow.python.framework.errors_impl.InvalidArgumentError: Conv2DCustomBackpropInputOp only supports NHWC.
    [[{{node gradients/resnet18_nopool_bn_detectnet_v2/output_bbox/convolution_grad/Conv2DBackpropInput}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py”, line 843, in
File “/opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py”, line 832, in
File “”, line 2, in main
File “/opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/utilities/timer.py”, line 46, in wrapped_fn
File “/opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py”, line 821, in main
File “/opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py”, line 702, in run_experiment
File “/opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py”, line 638, in train_gridbox
File “/opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py”, line 154, in run_training_loop
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py”, line 754, in run
run_metadata=run_metadata)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py”, line 1360, in run
raise six.reraise(*original_exc_info)
File “/usr/local/lib/python3.6/dist-packages/six.py”, line 696, in reraise
raise value
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py”, line 1345, in run
return self._sess.run(*args, **kwargs)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py”, line 1418, in run
run_metadata=run_metadata)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py”, line 1176, in run
return self._sess.run(*args, **kwargs)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py”, line 956, in run
run_metadata_ptr)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py”, line 1180, in _run
feed_dict_tensor, options, run_metadata)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py”, line 1359, in _do_run
run_metadata)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py”, line 1384, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Conv2DCustomBackpropInputOp only supports NHWC.
[[node gradients/resnet18_nopool_bn_detectnet_v2/output_bbox/convolution_grad/Conv2DBackpropInput (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) ]]

Original stack trace for ‘gradients/resnet18_nopool_bn_detectnet_v2/output_bbox/convolution_grad/Conv2DBackpropInput’:
File “/opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py”, line 832, in
File “”, line 2, in main
File “/opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/utilities/timer.py”, line 46, in wrapped_fn
File “/opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py”, line 821, in main
File “/opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py”, line 702, in run_experiment
File “/opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py”, line 613, in train_gridbox
File “/opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py”, line 468, in build_training_graph
File “/opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/model/detectnet_model.py”, line 598, in build_training_graph
File “/opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/training/train_op_generator.py”, line 59, in get_train_op
File “/opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/training/train_op_generator.py”, line 74, in _get_train_op_without_cost_scaling
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/optimizer.py”, line 419, in minimize
grad_loss=grad_loss)
File “/usr/local/lib/python3.6/dist-packages/horovod/tensorflow/init.py”, line 253, in compute_gradients
gradients = self._optimizer.compute_gradients(*args, **kwargs)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/optimizer.py”, line 537, in compute_gradients
colocate_gradients_with_ops=colocate_gradients_with_ops)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/gradients_impl.py”, line 158, in gradients
unconnected_gradients)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/gradients_util.py”, line 703, in _GradientsHelper
lambda: grad_fn(op, *out_grads))
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/gradients_util.py”, line 362, in _MaybeCompile
return grad_fn() # Exit early
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/gradients_util.py”, line 703, in
lambda: grad_fn(op, *out_grads))
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/nn_grad.py”, line 596, in _Conv2DGrad
data_format=data_format),
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/gen_nn_ops.py”, line 1407, in conv2d_backprop_input
name=name)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/op_def_library.py”, line 794, in _apply_op_helper
op_def=op_def)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/util/deprecation.py”, line 513, in new_func
return func(*args, **kwargs)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py”, line 3357, in create_op
attrs, op_def, compute_device)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py”, line 3426, in _create_op_internal
op_def=op_def)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py”, line 1748, in init
self._traceback = tf_stack.extract_stack()

…which was originally created as op ‘resnet18_nopool_bn_detectnet_v2/output_bbox/convolution’, defined at:
File “/opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py”, line 832, in
[elided 5 identical lines from previous traceback]
File “/opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py”, line 468, in build_training_graph
File “/opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/model/detectnet_model.py”, line 572, in build_training_graph
File “/usr/local/lib/python3.6/dist-packages/keras/engine/base_layer.py”, line 457, in call
output = self.call(inputs, **kwargs)
File “/usr/local/lib/python3.6/dist-packages/keras/engine/network.py”, line 564, in call
output_tensors, _, _ = self.run_internal_graph(inputs, masks)
File “/usr/local/lib/python3.6/dist-packages/keras/engine/network.py”, line 721, in run_internal_graph
layer.call(computed_tensor, **kwargs))
File “/usr/local/lib/python3.6/dist-packages/keras/layers/convolutional.py”, line 171, in call
dilation_rate=self.dilation_rate)
File “/opt/nvidia/third_party/keras/tensorflow_backend.py”, line 113, in conv2d
data_format=tf_data_format,
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/nn_ops.py”, line 921, in convolution
name=name)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/nn_ops.py”, line 1032, in convolution_internal
name=name)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/gen_nn_ops.py”, line 1071, in conv2d
data_format=data_format, dilations=dilations, name=name)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/op_def_library.py”, line 794, in _apply_op_helper
op_def=op_def)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/util/deprecation.py”, line 513, in new_func
return func(*args, **kwargs)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py”, line 3357, in create_op
attrs, op_def, compute_device)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py”, line 3426, in _create_op_internal
op_def=op_def)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py”, line 1748, in init
self._traceback = tf_stack.extract_stack()

2021-07-02 13:56:38,232 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

For 1, the tfrecords file should not be zero size. Please remove the zero size files.
For 2, there are similar topics in TLT forum previously. You can search “Conv2DCustomBackpropInputOp only supports NHWC”. It may be related to your gpu. Which dgpu did you use?

  1. There is no tfrecords with zero size , please find the reference image for kitti data


    2.Currently i am using MX130 gpu

  2. for custom dataset what is the minimum image size resolution ??

Even with several training images, the training can work.
What is the training spec?

Please share the full log when you generate tfrecords as well.

Please check all the software requirements. See NVIDIA TAO Documentation

MX130 gpu has compute capability of 5.0.
Related topic: TLT Detectnet TrafficCamNet training not working - #9 by Morganh

1 Like

Hi Morgan i am using the KITTI dataset only
Converting Tfrecords for kitti trainval dataset
2021-07-02 19:22:37,842 [INFO] root: Registry: [‘nvcr.io’]
2021-07-02 19:22:38,279 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the ~/.tlt_mounts.json file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
2021-07-02 13:53:21,772 - iva.detectnet_v2.dataio.build_converter - INFO - Instantiating a kitti converter
2021-07-02 13:53:21,877 - iva.detectnet_v2.dataio.kitti_converter_lib - INFO - Num images in
Train: 256 Val: 41
2021-07-02 13:53:21,877 - iva.detectnet_v2.dataio.kitti_converter_lib - INFO - Validation data in partition 0. Hence, while choosing the validationset during training choose validation_fold 0.
2021-07-02 13:53:21,878 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 0
WARNING:tensorflow:From /opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/dataio/dataset_converter_lib.py:142: The name tf.python_io.TFRecordWriter is deprecated. Please use tf.io.TFRecordWriter instead.

2021-07-02 13:53:21,879 - tensorflow - WARNING - From /opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/dataio/dataset_converter_lib.py:142: The name tf.python_io.TFRecordWriter is deprecated. Please use tf.io.TFRecordWriter instead.

/usr/local/lib/python3.6/dist-packages/iva/detectnet_v2/dataio/kitti_converter_lib.py:273: VisibleDeprecationWarning: Reading unicode strings without specifying the encoding argument is deprecated. Set the encoding, use None for the system default.
2021-07-02 13:53:22,027 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 1
2021-07-02 13:53:22,091 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 2
2021-07-02 13:53:22,180 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 3
2021-07-02 13:53:22,256 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 4
2021-07-02 13:53:22,292 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 5
2021-07-02 13:53:22,351 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 6
2021-07-02 13:53:22,421 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 7
2021-07-02 13:53:22,504 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 8
2021-07-02 13:53:22,558 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 9
2021-07-02 13:53:22,617 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO -
Wrote the following numbers of objects:
b’car’: 118
b’dontcare’: 96
b’cyclist’: 12
b’van’: 12
b’pedestrian’: 29
b’truck’: 7
b’misc’: 2
b’tram’: 8
b’person_sitting’: 5

2021-07-02 13:53:22,618 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 0
2021-07-02 13:53:23,052 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 1
2021-07-02 13:53:23,465 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 2
2021-07-02 13:53:23,837 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 3
2021-07-02 13:53:24,380 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 4
2021-07-02 13:53:24,871 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 5
2021-07-02 13:53:25,314 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 6
2021-07-02 13:53:25,812 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 7
2021-07-02 13:53:26,472 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 8
2021-07-02 13:53:27,007 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 9
2021-07-02 13:53:27,609 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO -
Wrote the following numbers of objects:
b’car’: 979
b’dontcare’: 505
b’cyclist’: 59
b’van’: 88
b’tram’: 22
b’pedestrian’: 105
b’truck’: 36
b’misc’: 36
b’person_sitting’: 7

2021-07-02 13:53:27,610 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Cumulative object statistics
2021-07-02 13:53:27,610 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO -
Wrote the following numbers of objects:
b’car’: 1097
b’dontcare’: 601
b’cyclist’: 71
b’van’: 100
b’pedestrian’: 134
b’truck’: 43
b’misc’: 38
b’tram’: 30
b’person_sitting’: 12

2021-07-02 13:53:27,610 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Class map.
Label in GT: Label in tfrecords file
b’Car’: b’car’
b’DontCare’: b’dontcare’
b’Cyclist’: b’cyclist’
b’Van’: b’van’
b’Pedestrian’: b’pedestrian’
b’Truck’: b’truck’
b’Misc’: b’misc’
b’Tram’: b’tram’
b’Person_sitting’: b’person_sitting’
For the dataset_config in the experiment_spec, please use labels in the tfrecords file, while writing the classmap.

2021-07-02 13:53:27,611 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Tfrecords generation complete.
2021-07-02 19:23:32,087 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

MX130 GPU has compute of 6.0 i checked in wiki!

Could you run below inside TLT docker to check compute capability? Thanks.
$ python
>> from numba import cuda
>> cuda.detect()

Found 1 CUDA devices
id 0 b’GeForce MX130’ [SUPPORTED]
compute capability: 5.0
pci device id: 0
pci bus id: 2
Summary:
1/1 devices are supported
True

Thanks for the info. So, MX130 gpu has compute capability of 5.0.
It is similar to related topic: TLT Detectnet TrafficCamNet training not working - #9 by Morganh

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.