Tesla P4, Heat and Performance Issue

I asked similar question in DeepStream Libraries/Other Libraries/ https://devtalk.nvidia.com/default/topic/1044465/other-libraries/deepstream-3-0-on-tesla-p4-performance-issue/post/5299095/#5299095

I am using Tesla P4, TensorRT 5.*, DeepStream 3.0
I am running sample app with command

deepstream-app -c configs/deepstream-app/source30_720p_dec_infer-resnet_tiled_display_int8.txt

Initially it runs faster but with passage of time it slow down. I can see nvidia-smi, Volatile GPU-Util increases to 100% and temperature raised frequently to 90C and above.

In configurations I have [sink0] with type=2, sync=1

I can see following message on console

There may be a timestamping problem, or this computer is too slow.
WARNING from sink_sub_bin_sink1: A lot of buffers are being dropped.
Debug info: gstbasesink.c(2854): gst_base_sink_is_too_late (): /GstPipeline:pipeline/GstBin:processing_bin_0/GstBin:sink_bin/GstBin:sink_sub_bin1/GstEglGlesSink:sink_sub_bin_sink1:

Finally system hangs on and I can’t perform any action.

I tried another sample (tracker) screenshots are attached with 4 streams. Within less than 1 minute GPU-Util and temperature raised and warning appears.

Hi,
How many channels you run, and how about the time spent to get the perf down? also the streams resolution?
I am using the default stream within the config, 720p stream, 30 channels, running about 7 minutes, the GPU
utility is around 85~96%, the temp is up to 73c, and not got system hang, Can you elaborate how to get the issues you met?

Hello amycao, I would like to know what kind of cooling is in place for your rig? Is it inside a temperature maintained server room?

yes, in a temperature maintained server room, the temperature set is around 18~22c, you need to custom your
cooling system according your needs.

Ty that must be it.

I have solved the heating issue with customize cooling fan with GPU.

According to performance metrics

25 Streams(25FPS 720p) give FPS 22.xx
30 Streams(25FPS 720p) give FPS 17.xx
40 Streams(25FPS 720p) give FPS 13.xx

I am still seeing warning when sync=1.

WARNING from sink_sub_bin_sink1: A lot of buffers are being dropped.
Debug info: gstbasesink.c(2854): gst_base_sink_is_too_late (): /GstPipeline:pipeline/GstBin:processing_bin_0/GstBin:sink_bin/GstBin:sink_sub_bin1/GstEglGlesSink:sink_sub_bin_sink1:
There may be a timestamping problem, or this computer is too slow.

Hi,
when GPU or CPU usage is around 100%, you will got frames dropped, like the log you got,
if you sees any stutter or playback is not smooth then need to reduce number of sources

I replace the sample video file sample_720p.mp4 with another mp4 file(also 720p), the new video is about 12 minutes.

Then i run deepstream-app with the new video file with 2 sources, the application crashs about 3 minutes later. below is error output:


cudaMemcpyAsync failed: unrecognized error code(32608)
CUDA runtime error 30 at line 1166 in file gstnvvidconv.cppCUDA runtime error 29 at line 1166 in file gstnvvidconv.cpp
free(): corrupted unsorted chunks *** Unable to set device in nvtracker_convert_buffer Line 186


I want to run nvidia-smi to get the GPU status, but I get error either :) below is error output when I run nvidia-smi:


Unable to determine the device handle for GPU 0000:01:00.0: GPU is lost. Reboot the system to recover this GPU


I have to reboot my ubuntu machine physically. So anybody know about why this happen?

solved, it is caused by no GPU fan, the temperature grows too high.