I use Deepstream for inference and analysis, and as the number of analyses increases, the number of frames analyzed per second seems to decrease. However, with the Nvitop command, GPU consumption has not increased
When it analyzes 30 1080p RTMP sources, it can analyze 6 fps one source, but as I increased the number of analyses, the number of analyses per second decreased. However, through the Nvitop command, its performance is far from reaching its limit.
This is my method for calculating frame rate, using:
frame_meta.frame_num
def _tiler_sink_pad_buffer_probe(self, pad, info, u_data):
gst_buffer = info.get_buffer()
if not gst_buffer:
print("Unable to get GstBuffer ")
return
# Retrieve batch metadata from the gst_buffer
# Note that pyds.gst_buffer_get_nvds_batch_meta() expects the
# C address of gst_buffer as input, which is obtained with hash(gst_buffer)
batch_meta = pyds.gst_buffer_get_nvds_batch_meta(hash(gst_buffer))
l_frame = batch_meta.frame_meta_list
while l_frame is not None:
try:
# Note that l_frame.data needs a cast to pyds.NvDsFrameMeta
# The casting is done by pyds.NvDsFrameMeta.cast()
# The casting also keeps ownership of the underlying memory
# in the C code, so the Python garbage collector will leave
# it alone.
frame_meta = pyds.NvDsFrameMeta.cast(l_frame.data)
except StopIteration:
break
frame_number = frame_meta.frame_num
source_id = frame_meta.source_id
l_obj = frame_meta.obj_meta_list
if source_id != 0:
boxes = []
track_boxes = []
txt_data = []
while l_obj is not None:
try:
# Casting l_obj.data to pyds.NvDsObjectMeta
obj_meta = pyds.glist_get_nvds_object_meta(l_obj.data)
except StopIteration:
break
boxes.append([float(obj_meta.detector_bbox_info.org_bbox_coords.left),
float(obj_meta.detector_bbox_info.org_bbox_coords.top),
float(obj_meta.detector_bbox_info.org_bbox_coords.width),
float(obj_meta.detector_bbox_info.org_bbox_coords.height),
int(obj_meta.class_id), float(obj_meta.confidence)])
track_boxes.append([float(obj_meta.detector_bbox_info.org_bbox_coords.left),
float(obj_meta.detector_bbox_info.org_bbox_coords.top),
float(obj_meta.detector_bbox_info.org_bbox_coords.left) + float(
obj_meta.detector_bbox_info.org_bbox_coords.width),
float(obj_meta.detector_bbox_info.org_bbox_coords.top) + float(
obj_meta.detector_bbox_info.org_bbox_coords.height),
float(obj_meta.confidence), obj_meta.class_id])
try:
l_obj = l_obj.next
except StopIteration:
break
if source_id != 0 :
caltime = time.time() - self.log_time_dict['time']
self.alloc_task_dic[self.stream_data[source_id]["task_id"]] = frame_number
if caltime >= 30:
cal_dict = {task_id: (frame_num - self.log_time_dict.get(task_id, 0)) / caltime for task_id, frame_num
in self.alloc_task_dic.items()}
cal_dict = {task_id: round(frame_number, 2) for task_id, frame_number in cal_dict.items()}
self.log_time_dict["time"] = time.time()
self.log_time_dict.update(self.alloc_task_dic)
with open("./log/{}_stream.txt".format(datetime.datetime.now().strftime("%Y-%m-%d")), "a") as f:
f.write(datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S") + "---" + str(cal_dict) + "\n")
try:
l_frame = l_frame.next
except StopIteration:
break
return Gst.PadProbeReturn.OK
`0:00:17.572518661 428 0x5bffbbf45c10 INFO nvinfer gstnvinfer.cpp:682:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2195> [UID = 1]: Use deserialized engine model: /home/runone/program/folder/model/yolov8s_exp85_736_11.engine
0:00:17.574248314 428 0x5bffbbf45c10 INFO nvinfer gstnvinfer_impl.cpp:328:notifyLoadModelStatus:<primary_gie> [UID 1]: Load new model:/home/runone/deepstream-implatform/deepstream-common/config_infer_primary_yoloV8.txt sucessfully
** ERROR: <main:733>: Could not open X Display
Quitting
[NvMultiObjectTracker] De-initialized
App run failed
The following command was executed by me on the host computer:
root@r:~# xhost +
xhost: unable to open display "localhost:14.0"
root@r~# export DISPLAY=:0
root@r~# xhost +
Authorization required, but no authorization protocol specified
xhost: unable to open display ":0"
After executing the above command, I restarted the container, but the output result did not change
I am not connected to the screen
Do I need to set anything extra? Can you give me some guidance? thank you
In addition to the GPU loading, you also need to look at the codec loadings. You can run the nvidia-smi dmon to check the enc and dec loading.
About our demo config file, you can try to set the sink to fakesink to test the performace.
And we have the method to get the latency, you can refer to it to get more accurate data.
==============NVSMI LOG==============
Timestamp : Mon Nov 25 11:39:49 2024
Driver Version : 535.183.06
CUDA Version : 12.2
Attached GPUs : 4
GPU 00000000:18:00.0
Product Name : NVIDIA L2
Product Brand : NVIDIA
Product Architecture : Ada Lovelace
Display Mode : Enabled
Display Active : Disabled
Persistence Mode : Disabled
Addressing Mode : None
MIG Mode
Current : N/A
Pending : N/A
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : 1325023062052
GPU UUID : GPU-8ed2b180-1c2b-e814-95a7-a590cfc8ccdc
Minor Number : 0
VBIOS Version : 95.04.6C.00.01
MultiGPU Board : No
Board ID : 0x1800
Board Part Number : 900-2G193-0030-000
GPU Part Number : 27B6-890-A1
FRU Part Number : N/A
Module ID : 1
Inforom Version
Image Version : G193.0220.00.01
OEM Object : 2.1
ECC Object : 6.16
Power Management Object : N/A
Inforom BBX Object Flush
Latest Timestamp : N/A
Latest Duration : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GSP Firmware Version : N/A
GPU Virtualization Mode
Virtualization Mode : None
Host VGPU Mode : N/A
GPU Reset Status
Reset Required : No
Drain and Reset Recommended : No
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x18
Device : 0x00
Domain : 0x0000
Device Id : 0x27B610DE
Bus Id : 00000000:18:00.0
Sub System Id : 0x193310DE
GPU Link Info
PCIe Generation
Max : 4
Current : 4
Device Current : 4
Device Max : 4
Host Max : 4
Link Width
Max : 16x
Current : 8x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 0 KB/s
Rx Throughput : 59000 KB/s
Atomic Caps Inbound : N/A
Atomic Caps Outbound : N/A
Fan Speed : N/A
Performance State : P0
Clocks Event Reasons
Idle : Not Active
Applications Clocks Setting : Not Active
SW Power Cap : Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
Sparse Operation Mode : N/A
FB Memory Usage
Total : 23034 MiB
Reserved : 334 MiB
Used : 9882 MiB
Free : 12816 MiB
BAR1 Memory Usage
Total : 32768 MiB
Used : 56 MiB
Free : 32712 MiB
Conf Compute Protected Memory Usage
Total : 0 MiB
Used : 0 MiB
Free : 0 MiB
Compute Mode : Default
Utilization
Gpu : 25 %
Memory : 20 %
Encoder : 0 %
Decoder : 17 %
JPEG : 0 %
OFA : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
ECC Mode
Current : Enabled
Pending : Enabled
ECC Errors
Volatile
SRAM Correctable : 0
SRAM Uncorrectable Parity : 0
SRAM Uncorrectable SEC-DED : 0
DRAM Correctable : 0
DRAM Uncorrectable : 0
Aggregate
SRAM Correctable : 0
SRAM Uncorrectable Parity : 0
SRAM Uncorrectable SEC-DED : 0
DRAM Correctable : 0
DRAM Uncorrectable : 0
SRAM Threshold Exceeded : No
Aggregate Uncorrectable SRAM Sources
SRAM L2 : 0
SRAM SM : 0
SRAM Microcontroller : 0
SRAM PCIE : 0
SRAM Other : 0
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending Page Blacklist : N/A
Remapped Rows
Correctable Error : 0
Uncorrectable Error : 0
Pending : No
Remapping Failure Occurred : No
Bank Remap Availability Histogram
Max : 96 bank(s)
High : 0 bank(s)
Partial : 0 bank(s)
Low : 0 bank(s)
None : 0 bank(s)
Temperature
GPU Current Temp : 69 C
GPU T.Limit Temp : 17 C
GPU Shutdown T.Limit Temp : -5 C
GPU Slowdown T.Limit Temp : -2 C
GPU Max Operating T.Limit Temp : 0 C
GPU Target Temperature : N/A
Memory Current Temp : N/A
Memory Max Operating T.Limit Temp : N/A
GPU Power Readings
Power Draw : 45.35 W
Current Power Limit : 72.00 W
Requested Power Limit : 72.00 W
Default Power Limit : 72.00 W
Min Power Limit : 40.00 W
Max Power Limit : 72.00 W
Module Power Readings
Power Draw : N/A
Current Power Limit : N/A
Requested Power Limit : N/A
Default Power Limit : N/A
Min Power Limit : N/A
Max Power Limit : N/A
Clocks
Graphics : 2025 MHz
SM : 2025 MHz
Memory : 6250 MHz
Video : 1770 MHz
Applications Clocks
Graphics : 2040 MHz
Memory : 6251 MHz
Default Applications Clocks
Graphics : 2040 MHz
Memory : 6251 MHz
Deferred Clocks
Memory : N/A
Max Clocks
Graphics : 2040 MHz
SM : 2040 MHz
Memory : 6251 MHz
Video : 1770 MHz
Max Customer Boost Clocks
Graphics : 2040 MHz
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Voltage
Graphics : 875.000 mV
Fabric
State : N/A
Status : N/A
Processes
GPU instance ID : N/A
Compute instance ID : N/A
Process ID : 55752
Type : C
Name : python3
Used GPU Memory : 1222 MiB
GPU instance ID : N/A
Compute instance ID : N/A
Process ID : 63437
Type : C
Name : python3
Used GPU Memory : 1080 MiB
GPU instance ID : N/A
Compute instance ID : N/A
Process ID : 71254
Type : C
Name : python3
Used GPU Memory : 982 MiB
GPU instance ID : N/A
Compute instance ID : N/A
Process ID : 78917
Type : C
Name : python3
Used GPU Memory : 984 MiB
GPU instance ID : N/A
Compute instance ID : N/A
Process ID : 86567
Type : C
Name : python3
Used GPU Memory : 984 MiB
GPU instance ID : N/A
Compute instance ID : N/A
Process ID : 94320
Type : C
Name : python3
Used GPU Memory : 1078 MiB
GPU instance ID : N/A
Compute instance ID : N/A
Process ID : 95616
Type : C
Name : /usr/bin/python3
Used GPU Memory : 3458 MiB
Since you are using deepstream-app to do the performance test, please follow Performance — DeepStream documentation for how to configure for the performance test.
Okay, but according to the documentation, setting the configured sink to 1 still results in the following error:
0:00:17.566151338 2403801 0x654dd90baa10 INFO nvinfer gstnvinfer_impl.cpp:328:notifyLoadModelStatus:<primary_gie> [UID 1]: Load new model:/opt/nvidia/deepstream/deepstream-6.4/samples/configs/deepstream-app/config_infer_primary.txt sucessfully
** ERROR: <main:733>: Could not open X Display
Quitting
[NvMultiObjectTracker] De-initialized
App run failed
source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt (5.7 KB)
The specific configuration file is as follows.
But I still don’t understand why the frame rate analysis in my own pipeline did not meet expectations, and it seems that the GPU’s various performances are far from reaching their limits…
Deepstream running in Docker,
Before running the example, I had already executed on the host machine:
root@r:~# xhost +
xhost: unable to open display "localhost:14.0"
root@r~# export DISPLAY=:0
root@r~# xhost +
Authorization required, but no authorization protocol specified
xhost: unable to open display ":0"
From the config file you attached, you didn’t change that to fakesink.
Please set the type=1 for [sink0] and enable=0 for [sink1] & [sink2].
I have tried on my side without a physical monitor, it worked normally when you used fakesink.
Does the warning above mean there is any problem?
This UTL looks similar to T4’s
If I use my model, the result is as follows:
WARNING from sink_sub_bin_sink1: A lot of buffers are being dropped.
Debug info: ../libs/gst/base/gstbasesink.c(3143): gst_base_sink_is_too_late (): /GstPipeline:pipeline/GstBin:processing_bin_0/GstBin:sink_bin/GstBin:sink_sub_bin1/GstFakeSink:sink_sub_bin_sink1:
There may be a timestamping problem, or this computer is too slow.
WARNING from sink_sub_bin_sink1: A lot of buffers are being dropped.
Debug info: ../libs/gst/base/gstbasesink.c(3143): gst_base_sink_is_too_late (): /GstPipeline:pipeline/GstBin:processing_bin_0/GstBin:sink_bin/GstBin:sink_sub_bin1/GstFakeSink:sink_sub_bin_sink1:
There may be a timestamping problem, or this computer is too slow.
WARNING from sink_sub_bin_sink1: A lot of buffers are being dropped.
Debug info: ../libs/gst/base/gstbasesink.c(3143): gst_base_sink_is_too_late (): /GstPipeline:pipeline/GstBin:processing_bin_0/GstBin:sink_bin/GstBin:sink_sub_bin1/GstFakeSink:sink_sub_bin_sink1:
There may be a timestamping problem, or this computer is too slow.
**PERF: 17.79 (17.97) 17.79 (17.97) 17.79 (17.97) 17.79 (17.97) 17.79 (17.97) 17.79 (17.97) 17.79 (17.97) 17.79 (17.97) 17.79 (17.97) 17.79 (17.97) 17.79 (17.97) 17.79 (17.97) 17.79 (17.97) 17.79 (17.97) 17.79 (17.98) 17.79 (17.97) 17.79 (17.97) 17.79 (17.97) 17.79 (17.97) 17.79 (17.97) 17.79 (17.97) 17.79 (17.97) 17.79 (18.03) 17.79 (17.97) 17.79 (17.97) 17.79 (17.97) 17.79 (17.97) 17.79 (17.95) 17.79 (17.97) 17.79 (17.97)