How to get 157FPS on PeopleNet using Jetson Xavier NX?

I am trying to run official PeopleNet demo using DeepStream on Jetson Xavier NX.
I am using resnet34_peoplenet_pruned.etlt, but only got a 73-74FPS in deepstream which is far from official 157FPS.


I am wondering how could I get such results on Xavier NX. The logs are as below:

$ deepstream-app -c deepstream_app_source1_peoplenet.txt
** WARN: <parse_gie:1104>: Unknown key 'network-model' for group [primary-gie]
Opening in BLOCKING MODE
gstnvtracker: Loading low-level lib at /opt/nvidia/deepstream/deepstream-5.0/lib/libnvds_mot_klt.so
gstnvtracker: Optional NvMOT_RemoveStreams not implemented
gstnvtracker: Batch processing is OFF
ERROR: Deserialize engine failed because file path: /opt/nvidia/deepstream/deepstream-5.0/samples/configs/tlt_pretrained_models/../../models/tlt_pretrained_models/peoplenet/resnet34_peoplenet_pruned.etlt_b1_gpu0_int8.engine open error
0:00:01.542823799  6087     0x2e663750 WARN                 nvinfer gstnvinfer.cpp:599:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1566> [UID = 1]: deserialize engine from file :/opt/nvidia/deepstream/deepstream-5.0/samples/configs/tlt_pretrained_models/../../models/tlt_pretrained_models/peoplenet/resnet34_peoplenet_pruned.etlt_b1_gpu0_int8.engine failed
0:00:01.542974007  6087     0x2e663750 WARN                 nvinfer gstnvinfer.cpp:599:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:1673> [UID = 1]: deserialize backend context from engine from file :/opt/nvidia/deepstream/deepstream-5.0/samples/configs/tlt_pretrained_models/../../models/tlt_pretrained_models/peoplenet/resnet34_peoplenet_pruned.etlt_b1_gpu0_int8.engine failed, try rebuild
0:00:01.543017943  6087     0x2e663750 INFO                 nvinfer gstnvinfer.cpp:602:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1591> [UID = 1]: Trying to create engine from model files
INFO: [TRT]:
INFO: [TRT]: --------------- Layers running on DLA:
INFO: [TRT]:
INFO: [TRT]: --------------- Layers running on GPU:
INFO: [TRT]: conv1/convolution + activation_1/Relu6, block_1a_conv_1/convolution + activation_2/Relu6, block_1a_conv_2/convolution, block_1a_conv_shortcut/convolution + add_1/add + activation_3/Relu6, block_1b_conv_1/convolution + activation_4/Relu6, block_1b_conv_2/convolution, block_1b_conv_shortcut/convolution + add_2/add + activation_5/Relu6, block_1c_conv_1/convolution + activation_6/Relu6, block_1c_conv_2/convolution, block_1c_conv_shortcut/convolution + add_3/add + activation_7/Relu6, block_2a_conv_1/convolution + activation_8/Relu6, block_2a_conv_2/convolution, block_2a_conv_shortcut/convolution + add_4/add + activation_9/Relu6, block_2b_conv_1/convolution + activation_10/Relu6, block_2b_conv_2/convolution, block_2b_conv_shortcut/convolution + add_5/add + activation_11/Relu6, block_2c_conv_1/convolution + activation_12/Relu6, block_2c_conv_2/convolution, block_2c_conv_shortcut/convolution + add_6/add + activation_13/Relu6, block_2d_conv_1/convolution + activation_14/Relu6, block_2d_conv_2/convolution, block_2d_conv_shortcut/convolution + add_7/add + activation_15/Relu6, block_3a_conv_1/convolution + activation_16/Relu6, block_3a_conv_2/convolution, block_3a_conv_shortcut/convolution + add_8/add + activation_17/Relu6, block_3b_conv_1/convolution + activation_18/Relu6, block_3b_conv_2/convolution, block_3b_conv_shortcut/convolution + add_9/add + activation_19/Relu6, block_3c_conv_1/convolution + activation_20/Relu6, block_3c_conv_2/convolution, block_3c_conv_shortcut/convolution + add_10/add + activation_21/Relu6, block_3d_conv_1/convolution + activation_22/Relu6, block_3d_conv_2/convolution, block_3d_conv_shortcut/convolution + add_11/add + activation_23/Relu6, block_3e_conv_1/convolution + activation_24/Relu6, block_3e_conv_2/convolution, block_3e_conv_shortcut/convolution + add_12/add + activation_25/Relu6, block_3f_conv_1/convolution + activation_26/Relu6, block_3f_conv_2/convolution, block_3f_conv_shortcut/convolution + add_13/add + activation_27/Relu6, block_4a_conv_1/convolution + activation_28/Relu6, block_4a_conv_2/convolution, block_4a_conv_shortcut/convolution + add_14/add + activation_29/Relu6, block_4b_conv_1/convolution + activation_30/Relu6, block_4b_conv_2/convolution, block_4b_conv_shortcut/convolution + add_15/add + activation_31/Relu6, block_4c_conv_1/convolution + activation_32/Relu6, block_4c_conv_2/convolution, block_4c_conv_shortcut/convolution + add_16/add + activation_33/Relu6, output_cov/convolution, output_cov/Sigmoid, output_bbox/convolution,
INFO: [TRT]: Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
INFO: [TRT]: Detected 1 inputs and 2 output network tensors.
0:00:39.445810391  6087     0x2e663750 INFO                 nvinfer gstnvinfer.cpp:602:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1624> [UID = 1]: serialize cuda engine to file: /opt/nvidia/deepstream/deepstream-5.0/samples/models/tlt_pretrained_models/peoplenet/resnet34_peoplenet_pruned.etlt_b1_gpu0_fp16.engine successfully
INFO: [Implicit Engine Info]: layers num: 3
0   INPUT  kFLOAT input_1         3x544x960
1   OUTPUT kFLOAT output_bbox/BiasAdd 12x34x60
2   OUTPUT kFLOAT output_cov/Sigmoid 3x34x60

0:00:39.471404936  6087     0x2e663750 INFO                 nvinfer gstnvinfer_impl.cpp:311:notifyLoadModelStatus:<primary_gie> [UID 1]: Load new model:/opt/nvidia/deepstream/deepstream-5.0/samples/configs/tlt_pretrained_models/config_infer_primary_peoplenet.txt sucessfully

Runtime commands:
	h: Print this help
	q: Quit

	p: Pause
	r: Resume

NOTE: To expand a source in the 2D tiled display and view object details, left-click on the source.
      To go back to the tiled display, right-click anywhere on the window.


**PERF: FPS 0 (Avg)
**PERF: 0.00 (0.00)
** INFO: <bus_callback:181>: Pipeline ready

Opening in BLOCKING MODE
NvMMLiteOpen : Block : BlockType = 279
NVMEDIA: Reading vendor.tegra.display-size : status: 6
NvMMLiteBlockCreate : Block : BlockType = 279
** INFO: <bus_callback:167>: Pipeline running

NvMMLiteOpen : Block : BlockType = 4
===== NVMEDIA: NVENC =====
NvMMLiteBlockCreate : Block : BlockType = 4
KLT Tracker Init
H264: Profile = 66, Level = 0
**PERF: 48.54 (47.90)
**PERF: 73.73 (64.36)
**PERF: 72.85 (67.70)
**PERF: 66.97 (67.49)
**PERF: 71.26 (68.25)
**PERF: 72.92 (69.10)
**PERF: 72.94 (69.69)
**PERF: 73.09 (70.12)
**PERF: 72.71 (70.45)
**PERF: 73.25 (70.71)
**PERF: 73.26 (70.92)
**PERF: 73.32 (71.19)
**PERF: 73.30 (71.33)
**PERF: 72.90 (71.45)
**PERF: 73.42 (71.55)
**PERF: 73.29 (71.71)
**PERF: 73.16 (71.78)
**PERF: 73.22 (71.85)
**PERF: 73.53 (71.96)

**PERF: FPS 0 (Avg)
**PERF: 73.05 (72.01)
** INFO: <bus_callback:204>: Received EOS. Exiting ...

Quitting
App run successful

I changed the params in deepstream_app_source1_peoplenet.txt and config_infer_primary_peoplenet.txt, intended to use int8 for inference, but as the logs printed it loaded fp16 engine which is not in the configuration file. Below is my configuration files.
config_infer_primary_peoplenet.txt (1.2 KB) deepstream_app_source1_peoplenet.txt (2.5 KB)

Hi,

Please update the config_infer_primary_peoplenet.txt and try it again.

[property]
...
## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=1

Thanks.

I modified the parameterm, and it worked. But I got a 130FPS which is still lower than the official 157FPS.
Is there anything else I need to change?

Thanks

There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

Hi,

Do you maximize the device performance first?

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

Thanks.