Issue in Training Trafficcamnet using TAO toolkit

akshat.pant.psi · April 26, 2022, 11:07am

Hello,

I am trying to train my model using TAO toolkit on my Jetson Xavier i have only kept 100 images for training purpose.

While creating tfrecords i am getting the following error:

Converting Tfrecords for kitti trainval dataset
Traceback (most recent call last):
File “/home/photon/miniconda3/bin/tao”, line 8, in
sys.exit(main())
File “/home/photon/miniconda3/lib/python3.7/site-packages/tlt/entrypoint/entrypoint.py”, line 115, in main
args[1:]
File “/home/photon/miniconda3/lib/python3.7/site-packages/tlt/components/instance_handler/local_instance.py”, line 296, in launch_command
docker_logged_in(required_registry=self.task_map[task].docker_registry)
File “/home/photon/miniconda3/lib/python3.7/site-packages/tlt/components/instance_handler/utils.py”, line 129, in docker_logged_in
data = load_config_file(docker_config)
File “/home/photon/miniconda3/lib/python3.7/site-packages/tlt/components/instance_handler/utils.py”, line 66, in load_config_file
“No file found at: {}. Did you run docker login?”.format(config_path)
AssertionError: Config path must be a valid unix path. No file found at: /home/photon/.docker/config.json. Did you run docker login?

The steps i performed after getting this error are:

Opened terminal and enter the following command:

docker login nvcr.io

Login was successful.

Then from that terminal i launch my jupyter notebook but again the same issue persists.

Please provide me any solution for this ASAP.

Thanks

AastaLLL · April 28, 2022, 7:45am

Hi,

TAO doesn’t support the Jetson platform.
You will need to apply the training in a desktop environment.

https://developer.nvidia.com/tao-toolkit

Can TAO Toolkit be used to train on Jetson?
------------------------------------------------------------
Training with TAO Toolkit is only on x86 with NVIDIA GPU such as a V100. Models trained with TAO Toolkit can be deployed on any NVIDIA platform including Jetson.

Thanks.

arnold9 · May 13, 2022, 11:53am

Hello,
I’m a novice with development on NVIDIA.
Might I know more concerning this statement: “TAO doesn’t support the Jetson platform.”

I am following the steps on this blog page, all from a Jetson NX with all the recommended libraries and prerequisites installed as per guides on the NVIDIA documentation.

There are frequent errors, as expected, which I have troubleshooted with help from the forums and other resources. I’m having trouble getting past this step:

Split the data into two parts: 80% for the training set and 20% for the validation set

The latest error returned on executing the command for this step is:

Error response from daemon: Container f595c6cb0e4c6c8dad9ee0a24ca1d675dc41b7772978c7e1b713624c4f2445cf is not running
...(some traceback lines)...
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: http+docker://localhost/v1.41/containers/f595c6cb0e4c6c8dad9ee0a24ca1d675dc41b7772978c7e1b713624c4f2445cf/stop

Among the things I’ve tried: restarting the system + docker, specifying the architecture with the --platform flag, pruning and redownloading the latest TAO CV image.

This may be related: Am I right in assuming that because of the platform (arm64/v8 architecture) I will have to execute the TAO Toolkit commands on another device and then only use the Jetson to execute what has already been trained and validated?

Thanks for any clarification you’re able to offer me :)

AastaLLL · May 25, 2022, 6:04am

Hi,

Sorry for the late update.

We recommend users train a model with TAO on a dGPU environment.
And copy the model to Jetson and deploy it with Deepstream SDK.

Thanks.

arnold9 · June 27, 2022, 10:55am

Thanks again Moderator for your email clarification.

I’ve since switched from the Jetson NX device to an Ubuntu 20.04 VM running NVIDIA GPU with device driver 470+. The TAO Toolkit setup runs inside a virtual environment as per NVIDIA’s advice here.

As per my previous question, on the same step I mentioned that involves splitting the dataset right before training, running the command:

$ tao detectnet_v2 dataset_convert -d /workspace/openalpr/SPECS_tfrecord.txt -o /workspace/openalpr/lpd_tfrecord/lpd

I get a new error that says:

“could not select device driver “” with capabilities: [[gpu]]”

Do you know how I can solve this? I tried advice from similar posts on the topic, but it seems they do not apply to my case.

arnold9 · July 25, 2022, 1:47pm

Hey Moderator, please see my reply above

arnold9 · August 30, 2022, 11:31am

Hey @AastaLLL ,
Would you please check my updated reply, thanks

Topic		Replies	Views
Tao yolov3 training dataset conversion fails KeyError: 'status' TAO Toolkit	14	172	May 14, 2024
Dataset_convert error in tao TAO Toolkit	3	437	September 16, 2022
Converting Tfrecords for kitti trainval dataset TAO Toolkit	6	788	October 12, 2021
Error in TAO-Toolkit while training TAO Toolkit	2	1100	January 4, 2022
Tfrecords are not generated but output is showing tfRecords generated TAO Toolkit	16	817	March 4, 2022
Issue with Docker instantiation while converting Tfrecords for KITTI trainval dataset in TAO Toolkit TAO Toolkit docker	4	402	April 8, 2024
Docker instantiation fails when running "tao detectnet_v2" on Xavier NX Jetson AGX Xavier docker	5	555	October 5, 2022
No Such Container (Docker Container) in TAO Example Code Run TAO Toolkit docker	7	950	January 3, 2023
Getting JSON related errors when training in docker container TAO Toolkit	10	420	May 22, 2023
AssertionError: Config path must be a valid unix path. TAO TOOLKIT TFRECORD CONVERT TAO Toolkit docker	8	875	February 20, 2023

Issue in Training Trafficcamnet using TAO toolkit

Related topics