Tao-converter [ERROR] Failed to parse the model, please check the encoding key to make sure its correct

pddarrell · June 3, 2023, 5:11pm

" ValueError: Invalid model: /workspace/tao-experiments/detectnet_v2/pretrained_resnet50/pretrained_detectnet_v2_vresnet50/resnet50.hdf5, please check the key used to load the model

2023-06-03 18:09:25,239 [INFO] tlt.components.docker_handler.docker_handler: Stopping container."

Morganh · June 3, 2023, 5:18pm

I suggest you to run below experiment:

train you model again with 1 epoch
export to .etlt model
run tao-converter again.

pddarrell · June 3, 2023, 5:20pm

Using “$KEY”? (it doesn’t recognise the explicit key)

Morganh · June 3, 2023, 5:25pm

You can check which one can be used to export successfully.

What I mention above is to run a quick experiment to verify the process.
You can train/export with any value. For example,

train you model again with 1 epoch, with key=123
export to .etlt model, with key=123
run tao-converter again, with key=123

pddarrell · June 3, 2023, 5:38pm

Okay, I will do that and get back, but this raises one other question:

“!tao detectnet_v2 train” is the first time one has to use the “-k” argument in this project, so on what basis is it saying that the model is invalid? The model was downloaded from the nVidia registry and that may have required my NGC API key, so does that mean that the resnet50.hdf5 is key-encrypted and therefore dictates the key that one needs to use (in effect, my NGC API key) for everything else that uses it?

Morganh · June 4, 2023, 10:09am

No, the .hdf5 is not key-encrypted.

In tao detectnet_v2 train, you can use your NGC API key or any value.

pddarrell · June 4, 2023, 10:57am

“ValueError: Invalid model: /workspace/tao-experiments/detectnet_v2/pretrained_resnet50/pretrained_detectnet_v2_vresnet50/resnet50.hdf5, please check the key used to load the model”

what does “check the key” mean? I have used “123”, as you suggested above.
please advise

Morganh · June 4, 2023, 12:39pm

Where did you download resnet50.hdf5? Please try to download it again.

pddarrell · June 4, 2023, 2:28pm

I carried out the step laid out in the jupyter notebook
“!ngc registry model download-version nvidia/tao/pretrained_detectnet_v2:resnet50
–dest $LOCAL_EXPERIMENT_DIR/pretrained_resnet50”

As a sanity check, to be clear, you would like me to:

sudo rm -r the existing pretrained_detectnet_v2_vresnet50 folder in my LOCAL_EXPERIMENT_DIR
Run the cell “!ngc registry model download-version nvidia/tao/pretrained_detectnet_v2:resnet50
–dest $LOCAL_EXPERIMENT_DIR/pretrained_resnet50”

and then run the 1 epoch experiment that you describe above

please confirm

Morganh · June 4, 2023, 3:32pm

Can you try below model?
wget ‘https://api.ngc.nvidia.com/v2/models/nvidia/tao/pretrained_detectnet_v2/versions/resnet50/files/resnet50.hdf5’

pddarrell · June 4, 2023, 5:02pm

The new model you asked me to download is giving the same result:
" Invalid model: /workspace/tao-experiments/detectnet_v2/pretrained_resnet50/pretrained_detectnet_v2_vresnet50/resnet50.hdf5, please check the key used to load the model"

2023-06-04 17:54:35,041 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

I used -k 123. but -k $KEY gives the same result.

Please advise

Morganh · June 4, 2023, 5:05pm

Could you change to resnet18 and check again? Need to change spec file s well. Thanks for your time.

wget ‘https://api.ngc.nvidia.com/v2/models/nvidia/tao/pretrained_detectnet_v2/versions/resnet18/files/resnet18.hdf5’

From TAO Pretrained DetectNet V2 | NVIDIA NGC

pddarrell · June 4, 2023, 6:34pm

I now have resnet50 working in training, using 123 as -k.
The hdF5 downloads inside a folder called “pretrained resnet_50”, which downloaded inside an existing “pretrained resnet_50” folder. I also had to make a small change to the “pretrained_model_file” path in the train kitti.txt.
Finally, a number of folder names seem to vary between the documentation and the jupyter notebook, e.g SPECS_DIR is sometimes referred to LOCAL_SPEC_DIR and LOCAL_PROJECT_DIR is sometimes USER_PROJECT_DIR.
I will now complete the original 3 stage experiment:

And I will get back with the result.

pddarrell · June 5, 2023, 2:49pm

Hi Morganh.
I have hit a small problem at the export stage:
I moved an old experiment_dir_final out of my LOCAL_EXPERIMENT_DIR and then ran the jupyter norebook cell.
The first line of code:
“!mkdir -p $LOCAL_EXPERIMENT_DIR/experiment_dir_final” creates a new empty folder and then the notebook throws a PermissionError associated with this folder.

The permissions are “peter:peter”, which is the same as most of the other folders and files used in this project.

Do you have any idea what the issue might be here?

Morganh · June 8, 2023, 8:20am

Refer to Tao Training failing on creating directory on a standard example - #9 by Morganh

pddarrell · June 8, 2023, 3:21pm

Thank you.

The advice contained therein is as follows:

Following the advice and deleting "
,
“DockerOptions”:{
“user”: “1000:1000”
}
gets rid of the Permission issue, but the notebook throws a new error:
“ValueError: Cannot find input file name”
This apparently relates to the -k KEY (see “Can’t export the model to int8”

As you know from above I am using the key “123” as suggested and it is the same key that was used to train a 1 epoch test for this ‘quick experiment’, which is proving to be anything but quick.

Please advise

Morganh · June 8, 2023, 3:46pm

Not sure if there is something mismatching in the ~/.tao_mounts.json file.

So, I suggest you to run the experiment again in a new terminal instead of notebook.

$ docker run --runtime=nvidia -it --rm -v /your/local/data/path:/docker/data/path nvcr.io/nvidia/tao/tao-toolkit:4.0.1-tf1.15.5 /bin/bash

Then, inside the docker, run training for one epoch. All the path and key are set explicitly.
# detectnet_v2 train xxx

Also,
# detectnet_v2 export xxx

pddarrell · June 9, 2023, 5:42pm

This has indeed downloaded the latest version of tao-toolkit, however I am not sure where it downloaded to.
When “detectnet_v2 train” is run from inside the docker I get:

It appears that the syntax for “detectnet_v2 train” isn’t fully recognised inside the docker and it seems to be suggesting that I need to include the task that I want performed. Do I add a flag for “train”, despite explicitly using “train” after “detectnet_v2”?
If so, what is the syntax that I should use?
Thank you

Morganh · June 10, 2023, 10:24am

Please try again to type the command in one line instead of multi lines using \.

pddarrell · June 10, 2023, 10:48am

Topic		Replies	Views
Convert model to Jetson Error during model export step in TAO notebook TAO Toolkit	21	2093	February 15, 2022
Detectnet_v2 notebook stuck at tfrecords conversion step TAO Toolkit	17	68	October 30, 2024
Tao toolkit observations TAO Toolkit	56	1084	May 29, 2024
Tao-converter failed to convert etlt to engine file due to could not find any implementation for node conv1/convolution + activate_1/Relu6 TAO Toolkit	9	850	April 26, 2022
Installing Tao-converter and running: Where is the "Encoding key" of FPEnet? TAO Toolkit tensorrt , tao	9	1159	July 7, 2022
EfficientDet in Deepstream Causes a Seg Fault TAO Toolkit efficientdet , tao	15	1075	July 19, 2022
Excute tao model detectnet_v2 train but Failed TAO Toolkit tao	5	226	June 4, 2024
Tao deploy error - TAO Toolkit jetson , deepstream	41	198	July 20, 2025
Tao toolkit facenet Error TAO Toolkit	14	1313	March 7, 2022
Tao-converter doesn't convert ".etlt" to ".engine" TAO Toolkit debugging-and-troubleshooting , tao , deepstream	10	664	October 20, 2023

Tao-converter [ERROR] Failed to parse the model, please check the encoding key to make sure its correct

Related topics