Python onnx_export.py shows error while trying to export model, please help

AnamikaPaul · January 6, 2022, 8:11pm

Namespace(batch_size=1, height=300, input=‘’, labels=‘labels.txt’, model_dir=‘models/drugs’, net=‘ssd-mobilenet’, output=‘’, width=300)
running on device cuda:0
found best checkpoint with loss 6.356691 (models/drugs/mb1-ssd-Epoch-0-Loss-6.356691091807921.pth)
creating network: ssd-mobilenet
num classes: 10
loading checkpoint: models/drugs/mb1-ssd-Epoch-0-Loss-6.356691091807921.pth
Traceback (most recent call last):
File “onnx_export.py”, line 86, in
net.load(args.input)
File “/jetson-inference/python/training/detection/ssd/vision/ssd/ssd.py”, line 135, in load
self.load_state_dict(torch.load(model, map_location=lambda storage, loc: storage))
File “/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py”, line 1407, in load_state_dict
self.class.name, “\n\t”.join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for SSD:
size mismatch for classification_headers.0.weight: copying a param with shape torch.Size([66, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([60, 512, 3, 3]).
size mismatch for classification_headers.0.bias: copying a param with shape torch.Size([66]) from checkpoint, the shape in current model is torch.Size([60]).
size mismatch for classification_headers.1.weight: copying a param with shape torch.Size([66, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([60, 1024, 3, 3]).
size mismatch for classification_headers.1.bias: copying a param with shape torch.Size([66]) from checkpoint, the shape in current model is torch.Size([60]).
size mismatch for classification_headers.2.weight: copying a param with shape torch.Size([66, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([60, 512, 3, 3]).
size mismatch for classification_headers.2.bias: copying a param with shape torch.Size([66]) from checkpoint, the shape in current model is torch.Size([60]).
size mismatch for classification_headers.3.weight: copying a param with shape torch.Size([66, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([60, 256, 3, 3]).
size mismatch for classification_headers.3.bias: copying a param with shape torch.Size([66]) from checkpoint, the shape in current model is torch.Size([60]).
size mismatch for classification_headers.4.weight: copying a param with shape torch.Size([66, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([60, 256, 3, 3]).
size mismatch for classification_headers.4.bias: copying a param with shape torch.Size([66]) from checkpoint, the shape in current model is torch.Size([60]).
size mismatch for classification_headers.5.weight: copying a param with shape torch.Size([66, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([60, 256, 3, 3]).
size mismatch for classification_headers.5.bias: copying a param with shape torch.Size([66]) from checkpoint, the shape in current model is torch.Size([60]).

dusty_nv · January 6, 2022, 8:19pm

Hi @AnamikaPaul, you may see this error when the number of classes in the labels file doesn’t match the number of classes that the model was trained with.

Can you use the --labels=models/drugs/labels.txt argument when running onnx_export.py?

AnamikaPaul · January 6, 2022, 8:22pm

Thanks for quick response , I tried, still shows the same error.

dusty_nv · January 6, 2022, 9:08pm

Can you paste the contents of your models/drugs/labels.txt file here?

It should have BACKGROUND as the first class. This background class gets added by train_ssd.py when you train the model. Hence if your original labels.txt from your dataset has 10 classes, then the version of labels.txt that gets saved with your model by train_ssd.py should have 11 classes.

AnamikaPaul · January 6, 2022, 11:15pm

I was able to convert my model into onnx but when i run the next command i found error again. yes i have a background as a first class in models/drugs/labels.txt
root@anamika-desktop:/jetson-inference/python/training/detection/ssd# detectnet --model=models/drugs/ssd-mobilenet.onnx labels=models/drugs/labels.txt --input-blob=input_0 --output-cvg=scores --output-bbox=boxes /dev/video0
> listDir(‘/usr/local/bin/labels=models/drugs/labels.txt’) - found no matches
> [image] imageLoader – failed to find ‘labels=models/drugs/labels.txt’
> detectnet: failed to create input stream

AnamikaPaul · January 6, 2022, 11:22pm

i missed to put – before labels, sorry about that

AnamikaPaul · January 6, 2022, 11:40pm

Hi dusty, thanks for your help. Now it doesnt start the camera shows error below
[TRT] Could not register plugin creator - ::FlattenConcat_TRT version 1

AnamikaPaul · January 7, 2022, 3:26pm

it took some time but worked at last. Thanks for the help.

dusty_nv · January 11, 2022, 7:05pm

Hi @AnamikaPaul, sorry for the delay - glad to hear that you got it working!