TX2 "INT8 not supported by platform. Trying FP16 mode"

rog07o4z · November 22, 2019, 8:22am

Hello Morganh,
I am starting a new post, originally coming from here: https://devtalk.nvidia.com/default/topic/1064467/deepstream-sdk/resnet10-quot-primary-detector-quot-/post/5404002/#5404002

I did as you said and converted the engine with tlt-converter.

Running:
$ /usr/src/tensorrt/bin/trtexec --int8 --loadEngine= --calib= --batch=1 --iterations=20 --output=output_cov/Sigmoid,output_bbox/BiasAdd --useSpinWait

I get an average execution time of 20ms. It doesn’t sound very fast (–> 50 fps on 1 stream, 10 fps on 5 streams).

When I run the same engine with deepstream I get the following errors.

Creating LL OSD context new
0:00:01.349564544  9511   0x55a4c146c0 WARN                 nvinfer gstnvinfer.cpp:515:gst_nvinfer_logger:<primary_gie_classifier> NvDsInferContext[UID 1]:useEngineFile(): Failed to read from model engine file
0:00:01.349669440  9511   0x55a4c146c0 INFO                 nvinfer gstnvinfer.cpp:519:gst_nvinfer_logger:<primary_gie_classifier> NvDsInferContext[UID 1]:initialize(): Trying to create engine from model files
0:00:01.349941088  9511   0x55a4c146c0 WARN                 nvinfer gstnvinfer.cpp:515:gst_nvinfer_logger:<primary_gie_classifier> NvDsInferContext[UID 1]:generateTRTModel(): INT8 not supported by platform. Trying FP16 mode.
0:00:01.349987200  9511   0x55a4c146c0 ERROR                nvinfer gstnvinfer.cpp:511:gst_nvinfer_logger:<primary_gie_classifier> NvDsInferContext[UID 1]:generateTRTModel(): No model files specified
0:00:01.350042720  9511   0x55a4c146c0 ERROR                nvinfer gstnvinfer.cpp:511:gst_nvinfer_logger:<primary_gie_classifier> NvDsInferContext[UID 1]:initialize(): Failed to create engine from model files

Why does it say “INT8 not supported by platform. Trying FP16 mode.”?

The config of the primary gie is the following.

[property]
    gpu-id=0
    net-scale-factor=0.0039215697906911373
    int8-calib-file=/detectnet_v2_resnet_10/calibration.bin
    labelfile-path=/detectnet_v2_resnet_10/classes.txt
    model-engine-file=engine/resnet10_int8.engine
    tlt-model-key=blablabla
    batch-size=5
    uff-input-blob-name=input_1
    uff-input-dims=3;608;608;0
    process-mode=1
    model-color-format=0
    network-mode=1
    num-detected-classes=2
    interval=0
    gie-unique-id=1
    output-blob-names=output_cov/Sigmoid;output_bbox/BiasAdd

AastaLLL · November 22, 2019, 8:38am

Hi,

Which device do you use?
Please noticed that not all the GPU can support INT8 operation.

The GPU architecture need to be 7.x or P4.
Please check this for the detail information:
https://docs.nvidia.com/deeplearning/sdk/tensorrt-support-matrix/index.html#hardware-precision-matrix

Thanks.

rog07o4z · November 22, 2019, 8:48am

I use Jetson TX2. But why is trtexec working fine?

$ /usr/src/tensorrt/bin/trtexec --int8 --loadEngine=resnet10_in18.engine --calib= --batch=1 --iterations=20 --output=output_cov/Sigmoid,output_bbox/BiasAdd --useSpinWait

Furthermore is the benchmark primary detector also loading a INT8 engine (“resnet10.caffemodel_b30_int8.engine”).

I am confused.

Morganh · November 22, 2019, 8:49am

Hi rog07o4z,
For the performance of generated TRT engine, could you please tell me more detailed info?

Which Jetson Platform, nano or Xavier or others?
What is the prune ratio? You can check it in pruning log.
What is size of the pruned model?
Can you share your full command of “tlt-converter”? I want to check your batch-size and TensorRT data type.

For the error running TRT engine with ds,
Can you paste the full log?

rog07o4z · November 22, 2019, 9:27am

Jetson TX2
Which pruning log??
ResNet10: After pruning there are 19368 weights left.
./tlt-converter resnet10_detector.etlt -e resnet10_int8.engine -k MYKEY -c resnet10_calibration.bin -o output_cov/Sigmoid,output_bbox/BiasAdd -d 3,608,608 -b 8 -m 4 -t int8 -i nchw
ds full log:

./deepstream-test5-app -c test5_config_file_src_infer_custom_detectnet_resnet10.txt 

(deepstream-test5-app:10645): GLib-GObject-WARNING **: 10:04:36.161: g_object_set_is_valid_property: object class 'avenc_mpeg4' has no property named 'iframeinterval'

(deepstream-test5-app:10645): GLib-GObject-WARNING **: 10:04:36.161: g_object_set_is_valid_property: object class 'avenc_mpeg4' has no property named 'bufapi-version'
Creating LL OSD context new
0:00:01.361678272 10645   0x55c0de16c0 WARN                 nvinfer gstnvinfer.cpp:515:gst_nvinfer_logger:<primary_gie_classifier> NvDsInferContext[UID 1]:useEngineFile(): Failed to read from model engine file
0:00:01.361774144 10645   0x55c0de16c0 INFO                 nvinfer gstnvinfer.cpp:519:gst_nvinfer_logger:<primary_gie_classifier> NvDsInferContext[UID 1]:initialize(): Trying to create engine from model files
0:00:01.362070752 10645   0x55c0de16c0 WARN                 nvinfer gstnvinfer.cpp:515:gst_nvinfer_logger:<primary_gie_classifier> NvDsInferContext[UID 1]:generateTRTModel(): INT8 not supported by platform. Trying FP16 mode.
0:00:01.362138176 10645   0x55c0de16c0 ERROR                nvinfer gstnvinfer.cpp:511:gst_nvinfer_logger:<primary_gie_classifier> NvDsInferContext[UID 1]:generateTRTModel(): No model files specified
0:00:01.362192672 10645   0x55c0de16c0 ERROR                nvinfer gstnvinfer.cpp:511:gst_nvinfer_logger:<primary_gie_classifier> NvDsInferContext[UID 1]:initialize(): Failed to create engine from model files
0:00:01.362252320 10645   0x55c0de16c0 WARN                 nvinfer gstnvinfer.cpp:692:gst_nvinfer_start:<primary_gie_classifier> error: Failed to create NvDsInferContext instance
0:00:01.362286688 10645   0x55c0de16c0 WARN                 nvinfer gstnvinfer.cpp:692:gst_nvinfer_start:<primary_gie_classifier> error: Config file path: detectnet_v2_resnet_10.txt, NvDsInfer Error: NVDSINFER_CONFIG_FAILED

can't set pipeline to playing state.
Quitting
ERROR from primary_gie_classifier: Failed to create NvDsInferContext instance
Debug info: /dvs/git/dirty/git-master_linux/deepstream/sdk/src/gst-plugins/gst-nvinfer/gstnvinfer.cpp(692): gst_nvinfer_start (): /GstPipeline:pipeline/GstBin:primary_gie_bin/GstNvInfer:primary_gie_classifier:
Config file path: detectnet_v2_resnet_10.txt, NvDsInfer Error: NVDSINFER_CONFIG_FAILED
App run failed

/usr/src/tensorrt/bin/trtexec --int8 --loadEngine=resnet10_int8.engine --calib=resnet10_calibration.bin --batch=1 --iterations=20 --output=output_cov/Sigmoid,output_bbox/BiasAdd --useSpinWait

LOG:

[I] int8
[I] loadEngine: resnet10_int8.engine
[I] calib: resnet10_calibration.bin
[I] batch: 1
[I] iterations: 20
[I] output: output_cov/Sigmoid,output_bbox/BiasAdd
[I] useSpinWait
[I] resnet10_int8.engine has been successfully loaded.
[I] Average over 10 runs is 191.889 ms (host walltime is 192.029 ms, 99% percentile time is 192.058).
[I] Average over 10 runs is 191.845 ms (host walltime is 191.889 ms, 99% percentile time is 192.084).
[I] Average over 10 runs is 191.94 ms (host walltime is 191.987 ms, 99% percentile time is 192.217).
[I] Average over 10 runs is 191.886 ms (host walltime is 191.933 ms, 99% percentile time is 192.066).
[I] Average over 10 runs is 191.846 ms (host walltime is 191.889 ms, 99% percentile time is 191.963).
[I] Average over 10 runs is 44.4582 ms (host walltime is 44.4947 ms, 99% percentile time is 191.925).
[I] Average over 10 runs is 19.5758 ms (host walltime is 19.6063 ms, 99% percentile time is 19.612).
[I] Average over 10 runs is 19.5645 ms (host walltime is 19.597 ms, 99% percentile time is 19.58).
[I] Average over 10 runs is 19.57 ms (host walltime is 19.6024 ms, 99% percentile time is 19.5988).
[I] Average over 10 runs is 19.5688 ms (host walltime is 19.6004 ms, 99% percentile time is 19.5856).
[I] Average over 10 runs is 19.5784 ms (host walltime is 19.6093 ms, 99% percentile time is 19.6333).
[I] Average over 10 runs is 19.5812 ms (host walltime is 19.6125 ms, 99% percentile time is 19.6102).
[I] Average over 10 runs is 19.5711 ms (host walltime is 19.602 ms, 99% percentile time is 19.6002).
[I] Average over 10 runs is 19.5767 ms (host walltime is 19.6081 ms, 99% percentile time is 19.6218).
[I] Average over 10 runs is 19.5617 ms (host walltime is 19.5921 ms, 99% percentile time is 19.5978).
[I] Average over 10 runs is 19.6818 ms (host walltime is 19.7295 ms, 99% percentile time is 20.1788).
[I] Average over 10 runs is 19.585 ms (host walltime is 19.6164 ms, 99% percentile time is 19.6313).
[I] Average over 10 runs is 19.5785 ms (host walltime is 19.6101 ms, 99% percentile time is 19.6181).
[I] Average over 10 runs is 19.5814 ms (host walltime is 19.6127 ms, 99% percentile time is 19.6403).
[I] Average over 10 runs is 19.5799 ms (host walltime is 19.6115 ms, 99% percentile time is 19.6614).

Please refer to that post for more information.
https://devtalk.nvidia.com/default/topic/1064467/deepstream-sdk/resnet10-quot-primary-detector-quot-/post/5404002/#5404002

Thanks

Morganh · November 22, 2019, 9:32am

Thanks rog07o4z. The pruning log locates at where you run the command “tlt-prune”.
Also, please check the size(how many MB) of pruned tlt model too.

rog07o4z · November 22, 2019, 9:38am

Pruning ratio (pruned model / original model): 1.0
Size of the pruned model:
total 19368
-rw-r–r-- 1 root root 19829544 Nov 22 09:23 resnet10_nopool_bn_detectnet_v2_pruned.tlt

Morganh · November 22, 2019, 10:13am

Hi rog07o4z,
Seems that your trained model did not prune because the pruning ratio is 1.0.
What’s your “-pth” value setting in tlt-prune command? Did you ever run re-training against pruned model?
The resnet10_nopool_bn_detectnet_v2_pruned.tlt you mentioned is a pruned tlt model which has not been retrained.

If you have retrained, could you also paste the size of the retrained model (i.e, resnet18_detector_pruned.tlt by default) and the size of exported etlt model(resnet18_detector.etlt)?

I’m asking this because pruned model will get higher performance than unpruned model.
I list the process for your better understanding: unpruned model → pruned → retrain → retrained model → exported etlt model

rog07o4z · November 22, 2019, 10:36am

Hi,
Yes I understand the concept of pruning and I also retrained the model before applying it in the ds pipeline.

The prune-command is:
!tlt-prune -pm $USER_EXPERIMENT_DIR/experiment_dir_unpruned/weights/resnet10_detector.tlt
-o $USER_EXPERIMENT_DIR/experiment_dir_pruned/
-eq union
-pth 0.0000052
-k $KEY

Size if retrained model:
total 19368
-rw-r–r-- 1 root root 19829328 Nov 22 10:31 resnet10_detector_pruned.tlt

Size of the untrained model:
total 39268
-rw------- 1 root root 253 Nov 22 10:31 license.txt
-rw------- 1 root root 40205392 Nov 22 10:31 resnet10.hdf5

Since the number of weights decreases from 39268 to 19368, I expected that the pruning worked fine.

Morganh · November 22, 2019, 10:49am

Hi rog07o4z,
The resnet10.hdf5 is the pre-trained model. It is not related to prune ratio.
Your prune ratio is 1.0. That means you have not pruned the trained model.

Now I can see the whole process from you .

You trained your own data and get unpruned model(not sure the size).
After pruned(pruned ratio is 1), you get 19M pruned model, resnet10_nopool_bn_detectnet_v2_pruned.tlt
After retraining, you get 19M newly retrained model, resnet10_detector_pruned.tlt
Then you export it as resnet10_detector.etlt (not sure the size)

See step 2, could you try to prune more? Refer to section 9 of tlt doc for more detailed.

rog07o4z · November 28, 2019, 10:58am

Hi Morganh,
Just for feedback. Increasing the parameter -pth immensly helped to get a pruning ration < 1.
Thanks. Now I know where to tweak.

Topic		Replies	Views
The tlt-converter does not work well with TensorRT 6 (Jetson TX2) TAO Toolkit	7	772	October 12, 2021
Fpenet retraining output file onnx but deepstream is using tlt TAO Toolkit	22	877	October 17, 2023
Engine failed to match config params, trying rebuild DeepStream SDK	10	166	July 9, 2024
Lack of FPS after successfully deploy TLT to Deepstream. DeepStream SDK	18	1005	April 27, 2020
Inferring resnet18 classification etlt model with python TAO Toolkit	45	4002	October 12, 2021
Reduce initialization in deepstream samples - INT8 to FP16 DeepStream SDK performance	6	1229	October 12, 2021
Resnet10 "primary detector" DeepStream SDK	17	5233	October 12, 2021
Load only .engine file DeepStream SDK	17	1235	June 24, 2022
No resnet18_vehicletypenet_pruned.onnx_b16_gpu0_int8.engine generated in DS7.1 DeepStream SDK cudnn , jetson , deepstream	5	63	February 8, 2025
Failed to create .engine File TAO Toolkit	33	2040	July 11, 2022

TX2 "INT8 not supported by platform. Trying FP16 mode"

Related topics