TAO 5.0 Classification (PyTorch) deploy error

How about /usr/src/tensorrt/bin/trtexec --onnx=/model/classification_model_export.onnx

The same error occurred

Could you please pull nvcr.io/nvidia/deepstream:6.3-samples to retry? Thanks.
Its TensortRT version is 8.5.3.
Similar issue can be found in SSA validation FAIL tensor concat_19_0'.1__mye152's uses list size does not match the actual use in the program:1 vs. 0 · Issue #2009 · NVIDIA/TensorRT · GitHub.
So, please check if newer version of TRT helps.

Indeed, that is the issue. Do I have to install DeepStream 6.3 or just TensorRT 8.5.3? I would also like to know why I need to use version 8.5.3. The model was generated locally with version 8.5.2 on my pc.

Thanks for your help!

Do you mean you did not use tao docker to generate the onnx file?
Please run as below and generate onnx file inside it,
$ docker run --runtime=nvidia -it --rm nvcr.io/nvidia/tao/tao-toolkit:5.0.0-pyt /bin/bash

The TRT version is 8.5.3 in this docker.

I have successfully converted the onnx file to an engine file in the docker. You’re right, this is really a matter of version. I want this model to run natively instead of docker, do I need to reinstall DeepStream 6.3 or just TensorRT 8.5.3?

You can reinstall TensorRT to verify.

I reinstalled TensorRT8.5.3 and successfully converted the onnx file into an engine file. The program also ran successfully. I detected 12 dog videos using the program, all of which were identified as cats. When I placed screenshots of the videos in notebook and directly recognized them, 9 of them were able to be recognized as dogs normally. May the problem arise in model transformation?

There is a line of code output-blob-names=predictions/Softmax in the configuration file that will report an error. I commented it out directly. Does this have any impact?

Can you share the full log?

Sorry, I tried to add this line of code and ran it without any errors this time. But all dog videos are still detected as cats.

lab@lab:/opt/nvidia/deepstream/deepstream/sources/apps/sample_apps/deepstream-class$ deepstream-app -c ds_classification_as_primary_gie.txt 
Warning: 'input-dims' parameter has been deprecated. Use 'infer-dims' instead.
0:00:01.982158091 138372 0x5636b0c3bd30 INFO                 nvinfer gstnvinfer.cpp:680:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1909> [UID = 1]: deserialized trt engine from :/opt/nvidia/deepstream/deepstream-6.2/sources/apps/sample_apps/deepstream-class/classification_model_export1.engine
WARNING: [TRT]: The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
INFO: ../nvdsinfer/nvdsinfer_model_builder.cpp:610 [Implicit Engine Info]: layers num: 2
0   INPUT  kFLOAT input_1         3x224x224       
1   OUTPUT kFLOAT probs           2               

ERROR: [TRT]: 3: Cannot find binding of given name: predictions/Softmax
0:00:02.046017579 138372 0x5636b0c3bd30 WARN                 nvinfer gstnvinfer.cpp:677:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::checkBackendParams() <nvdsinfer_context_impl.cpp:1876> [UID = 1]: Could not find output layer 'predictions/Softmax' in engine
0:00:02.046027741 138372 0x5636b0c3bd30 INFO                 nvinfer gstnvinfer.cpp:680:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2012> [UID = 1]: Use deserialized engine model: /opt/nvidia/deepstream/deepstream-6.2/sources/apps/sample_apps/deepstream-class/classification_model_export1.engine
0:00:02.047820975 138372 0x5636b0c3bd30 INFO                 nvinfer gstnvinfer_impl.cpp:328:notifyLoadModelStatus:<primary_gie> [UID 1]: Load new model:/opt/nvidia/deepstream/deepstream-6.2/sources/apps/sample_apps/deepstream-class/config_as_primary_gie.txt sucessfully

Runtime commands:
	h: Print this help
	q: Quit

	p: Pause
	r: Resume

NOTE: To expand a source in the 2D tiled display and view object details, left-click on the source.
      To go back to the tiled display, right-click anywhere on the window.

** INFO: <bus_callback:239>: Pipeline ready

** INFO: <bus_callback:225>: Pipeline running


**PERF:  FPS 0 (Avg)	
**PERF:  57.38 (57.12)	
nvstreammux: Successfully handled EOS for source_id=0
** INFO: <bus_callback:262>: Received EOS. Exiting ...

Quitting
App run successful

Do you mean above log?
Please modify to output-blob-names=probs.
Also, please comment out
model-engine-file=/path/to/model.engine, then uncomment onnx file, to let deepstream generate tensorrt engine file and run inference.

I made the modifications according to your method, but it still cannot recognize them correctly.

lab@lab:/opt/nvidia/deepstream/deepstream/sources/apps/sample_apps/deepstream-class$ deepstream-app -c ds_classification_as_primary_gie2.txt 
Warning: 'input-dims' parameter has been deprecated. Use 'infer-dims' instead.
0:00:00.095894851 141999 0x557523b9d930 INFO                 nvinfer gstnvinfer.cpp:680:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1923> [UID = 1]: Trying to create engine from model files
WARNING: [TRT]: onnx2trt_utils.cpp:377: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
WARNING: [TRT]: onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
WARNING: [TRT]: onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
WARNING: [TRT]: onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
WARNING: [TRT]: onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
WARNING: [TRT]: onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
WARNING: [TRT]: onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
WARNING: [TRT]: onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
WARNING: [TRT]: onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
WARNING: [TRT]: onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
WARNING: [TRT]: onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
0:01:39.985988221 141999 0x557523b9d930 INFO                 nvinfer gstnvinfer.cpp:680:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1955> [UID = 1]: serialize cuda engine to file: /opt/nvidia/deepstream/deepstream-6.2/sources/apps/sample_apps/deepstream-class/classification_model_export.onnx_b1_gpu0_fp32.engine successfully
WARNING: [TRT]: The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
INFO: ../nvdsinfer/nvdsinfer_model_builder.cpp:610 [Implicit Engine Info]: layers num: 2
0   INPUT  kFLOAT input_1         3x224x224       
1   OUTPUT kFLOAT probs           2               

0:01:40.198581748 141999 0x557523b9d930 WARN                 nvinfer gstnvinfer.cpp:1037:gst_nvinfer_start:<primary_gie> warning: NvInfer asynchronous mode is applicable for secondaryclassifiers only. Turning off asynchronous mode
0:01:40.198764293 141999 0x557523b9d930 INFO                 nvinfer gstnvinfer_impl.cpp:328:notifyLoadModelStatus:<primary_gie> [UID 1]: Load new model:/opt/nvidia/deepstream/deepstream-6.2/sources/apps/sample_apps/deepstream-class/config_as_primary_gie2.txt sucessfully

Runtime commands:
	h: Print this help
	q: Quit

	p: Pause
	r: Resume

NOTE: To expand a source in the 2D tiled display and view object details, left-click on the source.
      To go back to the tiled display, right-click anywhere on the window.


**PERF:  FPS 0 (Avg)	
**PERF:  0.00 (0.00)	
** INFO: <bus_callback:239>: Pipeline ready

WARNING from primary_gie: NvInfer asynchronous mode is applicable for secondaryclassifiers only. Turning off asynchronous mode
Debug info: gstnvinfer.cpp(1037): gst_nvinfer_start (): /GstPipeline:pipeline/GstBin:primary_gie_bin/GstNvInfer:primary_gie
** INFO: <bus_callback:225>: Pipeline running

**PERF:  58.00 (57.83)	
nvstreammux: Successfully handled EOS for source_id=0
** INFO: <bus_callback:262>: Received EOS. Exiting ...

Quitting

Please perform two modifications as well and retry.
1)For your test video, please change to .avi file. Refer to Issue with image classification tutorial and testing with deepstream-app - #24 by Morganh
2) Set scaling-filter=5. Refer to Issue with image classification tutorial and testing with deepstream-app - #32 by Morganh

I used the ffmpeg tool to convert video formats, but all videos still cannot be recognized correctly.

How about scaling-filter=5?

yeah, I also set it up

If possible, could you please share latest onnx file and config file, test video?
I am going to reproduce on my side.

Thank you very much. The following are the files I am using.
onnx file:
classification_model_export.onnx (28.2 MB)
config file:
ds_classification_as_primary_gie.txt (3.0 KB)
config_as_primary_gie.txt (802 Bytes)
labels file:
labels.txt (10 Bytes)

I don’t know why uploading videos keeps waiting. I’ll try uploading again
test video:

Thanks. Will check further.

I generate tensorrt engine with below tao-deploy docker and then run inference. The results are also not correct.
I also try to run inference against the dataset(Dropbox - cats_dogs_dataset_reorg.zip - Simplify your life) mentioned in the notebook to ensure the workflow is correct.

$ docker run --runtime=nvidia -it --rm -v /home/morganh:/home/morganh nvcr.io/nvidia/tao/tao-toolkit:5.0.0-deploy /bin/bash

classification_pyt gen_trt_engine -e /home/morganh/demo_2.0/classification_standalone_infer_forum_265020/export.yaml gen_trt_engine.onnx_file=classification_model_export_forum.onnx gen_trt_engine.trt_engine=classification_model_export_forum.engine results_dir=/home/morganh/demo_2.0/classification_standalone_infer_forum_265020/result

classification_pyt inference -e /home/morganh/demo_2.0/classification_standalone_infer_forum_265020/test.yaml inference.trt_engine=/home/morganh/demo_2.0/classification_standalone_infer_forum_265020/classification_model_export_forum.engine results_dir=/home/morganh/demo_2.0/classification_standalone_infer_forum_265020/result

You can also confirm my result. Seems that there is overfit. It is needed to add more training images which are similar to the image you shared. Or you can try some other images.