Issue in Training Trafficcamnet using TAO toolkit

Hello,

I am trying to train my model using TAO toolkit on my Jetson Xavier i have only kept 100 images for training purpose.

While creating tfrecords i am getting the following error:

Converting Tfrecords for kitti trainval dataset
Traceback (most recent call last):
File “/home/photon/miniconda3/bin/tao”, line 8, in
sys.exit(main())
File “/home/photon/miniconda3/lib/python3.7/site-packages/tlt/entrypoint/entrypoint.py”, line 115, in main
args[1:]
File “/home/photon/miniconda3/lib/python3.7/site-packages/tlt/components/instance_handler/local_instance.py”, line 296, in launch_command
docker_logged_in(required_registry=self.task_map[task].docker_registry)
File “/home/photon/miniconda3/lib/python3.7/site-packages/tlt/components/instance_handler/utils.py”, line 129, in docker_logged_in
data = load_config_file(docker_config)
File “/home/photon/miniconda3/lib/python3.7/site-packages/tlt/components/instance_handler/utils.py”, line 66, in load_config_file
“No file found at: {}. Did you run docker login?”.format(config_path)
AssertionError: Config path must be a valid unix path. No file found at: /home/photon/.docker/config.json. Did you run docker login?

The steps i performed after getting this error are:

  1. Opened terminal and enter the following command:

docker login nvcr.io

Login was successful.

  1. Then from that terminal i launch my jupyter notebook but again the same issue persists.

Please provide me any solution for this ASAP.

Thanks

Hi,

TAO doesn’t support the Jetson platform.
You will need to apply the training in a desktop environment.

https://developer.nvidia.com/tao-toolkit

Can TAO Toolkit be used to train on Jetson?
------------------------------------------------------------
Training with TAO Toolkit is only on x86 with NVIDIA GPU such as a V100. Models trained with TAO Toolkit can be deployed on any NVIDIA platform including Jetson.

Thanks.

Hello,
I’m a novice with development on NVIDIA.
Might I know more concerning this statement: “TAO doesn’t support the Jetson platform.”

I am following the steps on this blog page, all from a Jetson NX with all the recommended libraries and prerequisites installed as per guides on the NVIDIA documentation.

There are frequent errors, as expected, which I have troubleshooted with help from the forums and other resources. I’m having trouble getting past this step:

Split the data into two parts: 80% for the training set and 20% for the validation set

The latest error returned on executing the command for this step is:

Error response from daemon: Container f595c6cb0e4c6c8dad9ee0a24ca1d675dc41b7772978c7e1b713624c4f2445cf is not running
...(some traceback lines)...
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: http+docker://localhost/v1.41/containers/f595c6cb0e4c6c8dad9ee0a24ca1d675dc41b7772978c7e1b713624c4f2445cf/stop

Among the things I’ve tried: restarting the system + docker, specifying the architecture with the --platform flag, pruning and redownloading the latest TAO CV image.

This may be related: Am I right in assuming that because of the platform (arm64/v8 architecture) I will have to execute the TAO Toolkit commands on another device and then only use the Jetson to execute what has already been trained and validated?

Thanks for any clarification you’re able to offer me :)

Hi,

Sorry for the late update.

We recommend users train a model with TAO on a dGPU environment.
And copy the model to Jetson and deploy it with Deepstream SDK.

Thanks.

Thanks again Moderator for your email clarification.

I’ve since switched from the Jetson NX device to an Ubuntu 20.04 VM running NVIDIA GPU with device driver 470+. The TAO Toolkit setup runs inside a virtual environment as per NVIDIA’s advice here.

As per my previous question, on the same step I mentioned that involves splitting the dataset right before training, running the command:

$ tao detectnet_v2 dataset_convert -d /workspace/openalpr/SPECS_tfrecord.txt -o /workspace/openalpr/lpd_tfrecord/lpd

I get a new error that says:

“could not select device driver “” with capabilities: [[gpu]]”

Do you know how I can solve this? I tried advice from similar posts on the topic, but it seems they do not apply to my case.

Hey Moderator, please see my reply above

Hey @AastaLLL ,
Would you please check my updated reply, thanks