PERF issues with DeepStream6.2 + YOLOv8 in Jetson Xavier

Hello, folks!

I’m experiencing performance issues with YOLOv8 using DeepStream6.2. I’m using the default yolov8s.pt file and the generated cfg, wts, and labels.txt files (source: https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8s.pt) and performing inference on the sample_1080p_h264.mp4 video (path: /opt/nvidia/deepstream/deepstream/samples/streams/). The indicated performance is around 25FPS on a 1920x1080 monitor operating at 60Hz. How can I increase this frame rate?

System Specifications:
Jetson Xavier NX
Volta GPU 384-core NVIDIA with 48 Tensor Cores
Ubuntu 20.04 LTS
CUDA 11.4.315
DeepStream 6.2
JetPack 5.1
PyTorch 1.12.0
Torchvision 0.13.0
TensorRT 8.5.2.2

Output:

**PERF: FPS 0 (Avg)
**PERF: 0.00 (0.00)
** INFO: <bus_callback:239>: Pipeline ready

Opening in BLOCKING MODE
NvMMLiteOpen : Block : BlockType = 261
NVMEDIA: Reading vendor.tegra.display-size : status: 6
NvMMLiteBlockCreate : Block : BlockType = 261
** INFO: <bus_callback:225>: Pipeline running

**PERF: 34.75 (31.86)
**PERF: 25.09 (27.03)
**PERF: 24.75 (26.27)
**PERF: 25.57 (24.91)
nvstreammux: Successfully handled EOS for source_id=0
**PERF: 25.53 (25.97)
**PERF: 25.68 (25.57)
**PERF: 25.52 (25.73)
**PERF: 25.61 (25.76)
**PERF: 25.63 (25.55)
**PERF: 25.59 (25.47)
**PERF: 25.52 (25.41)
** INFO: <bus_callback:262>: Received EOS. Exiting …

File deepstream_app_config.txt:

[application]
enable-perf-measurement=1
perf-measurement-interval-sec=5

[tiled-display]
enable=1
rows=1
columns=1
width=1280
height=720
gpu-id=0
nvbuf-memory-type=0

[source0]
enable=1
type=3
uri=file:///opt/nvidia/deepstream/deepstream/samples/streams/sample_1080p_h264.mp4
num-sources=1
num-extra-surfaces=24
gpu-id=0
cudadec-memtype=0

[sink0]
enable=1
type=2
sync=0
gpu-id=0
nvbuf-memory-type=0

[osd]
enable=1
gpu-id=0
border-width=5
text-size=15
text-color=1;1;1;1;
text-bg-color=0.3;0.3;0.3;1
font=Serif
show-clock=0
clock-x-offset=800
clock-y-offset=820
clock-text-size=12
clock-color=1;0;0;0
nvbuf-memory-type=0

[streammux]
gpu-id=0
live-source=0
buffer-pool-size=1000
batch-size=1000
batched-push-timeout=100000
width=1280
height=720
enable-padding=0
nvbuf-memory-type=0

[primary-gie]
enable=1
gpu-id=0
batch-size=1
gie-unique-id=1
nvbuf-memory-type=0
config-file=config_infer_primary_yoloV8.txt

[tests]
file-loop=0

Thank you!

Please also share your PGIE config file, thanks.

Okay, here is the PGIE config file:

File config_infer_primary_yoloV8.txt:

[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
model-color-format=0
custom-network-config=yolov8s.cfg
model-file=yolov8s.wts
model-engine-file=model_b1_gpu0_fp32.engine
#int8-calib-file=calib.table
labelfile-path=labels.txt
batch-size=10 #=1 default
network-mode=0
num-detected-classes=80
interval=0
gie-unique-id=1
process-mode=2
network-type=0
cluster-mode=2
maintain-aspect-ratio=1
symmetric-padding=1
parse-bbox-func-name=NvDsInferParseYolo
custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
engine-create-func-name=NvDsInferYoloCudaEngineGet

[class-attrs-all]
nms-iou-threshold=0.45
pre-cluster-threshold=0.25
topk=300

Why did you set nvstreammux batch size as 1000? Frequently Asked Questions — DeepStream 6.3 Release documentation

batched-push-timeout should be 1/framerate(ms)

Please measure the model performance of the “model_b1_gpu0_fp32.engine” by the “trtexec” tool.