Tesla P4, Heat and Performance Issue

tahir · November 26, 2018, 12:51pm

I asked similar question in DeepStream Libraries/Other Libraries/ https://devtalk.nvidia.com/default/topic/1044465/other-libraries/deepstream-3-0-on-tesla-p4-performance-issue/post/5299095/#5299095

I am using Tesla P4, TensorRT 5.*, DeepStream 3.0
I am running sample app with command

deepstream-app -c configs/deepstream-app/source30_720p_dec_infer-resnet_tiled_display_int8.txt

Initially it runs faster but with passage of time it slow down. I can see nvidia-smi, Volatile GPU-Util increases to 100% and temperature raised frequently to 90C and above.

In configurations I have [sink0] with type=2, sync=1

I can see following message on console

There may be a timestamping problem, or this computer is too slow.
WARNING from sink_sub_bin_sink1: A lot of buffers are being dropped.
Debug info: gstbasesink.c(2854): gst_base_sink_is_too_late (): /GstPipeline:pipeline/GstBin:processing_bin_0/GstBin:sink_bin/GstBin:sink_sub_bin1/GstEglGlesSink:sink_sub_bin_sink1:

Finally system hangs on and I can’t perform any action.

I tried another sample (tracker) screenshots are attached with 4 streams. Within less than 1 minute GPU-Util and temperature raised and warning appears.

Amycao · November 27, 2018, 6:40am

Hi,
How many channels you run, and how about the time spent to get the perf down? also the streams resolution?
I am using the default stream within the config, 720p stream, 30 channels, running about 7 minutes, the GPU
utility is around 85~96%, the temp is up to 73c, and not got system hang, Can you elaborate how to get the issues you met?

Red-Draken · November 28, 2018, 4:54am

Hello amycao, I would like to know what kind of cooling is in place for your rig? Is it inside a temperature maintained server room?

Amycao · November 28, 2018, 5:17am

yes, in a temperature maintained server room, the temperature set is around 18~22c, you need to custom your
cooling system according your needs.

Red-Draken · November 28, 2018, 5:28am

Ty that must be it.

tahir · December 3, 2018, 10:07am

I have solved the heating issue with customize cooling fan with GPU.

According to performance metrics

25 Streams(25FPS 720p) give FPS 22.xx
30 Streams(25FPS 720p) give FPS 17.xx
40 Streams(25FPS 720p) give FPS 13.xx

I am still seeing warning when sync=1.

WARNING from sink_sub_bin_sink1: A lot of buffers are being dropped.
Debug info: gstbasesink.c(2854): gst_base_sink_is_too_late (): /GstPipeline:pipeline/GstBin:processing_bin_0/GstBin:sink_bin/GstBin:sink_sub_bin1/GstEglGlesSink:sink_sub_bin_sink1:
There may be a timestamping problem, or this computer is too slow.

Amycao · December 10, 2018, 12:54pm

Hi,
when GPU or CPU usage is around 100%, you will got frames dropped, like the log you got,
if you sees any stutter or playback is not smooth then need to reduce number of sources

zhouzhi9 · March 4, 2019, 5:53am

I replace the sample video file sample_720p.mp4 with another mp4 file(also 720p), the new video is about 12 minutes.

Then i run deepstream-app with the new video file with 2 sources, the application crashs about 3 minutes later. below is error output:

cudaMemcpyAsync failed: unrecognized error code(32608)
CUDA runtime error 30 at line 1166 in file gstnvvidconv.cppCUDA runtime error 29 at line 1166 in file gstnvvidconv.cpp
free(): corrupted unsorted chunks *** Unable to set device in nvtracker_convert_buffer Line 186

I want to run nvidia-smi to get the GPU status, but I get error either :) below is error output when I run nvidia-smi:

Unable to determine the device handle for GPU 0000:01:00.0: GPU is lost. Reboot the system to recover this GPU

I have to reboot my ubuntu machine physically. So anybody know about why this happen?

zhouzhi9 · March 4, 2019, 10:16am

I replace the sample video file sample_720p.mp4 with another mp4 file(also 720p), the new video is about 12 minutes.

Then i run deepstream-app with the new video file with 2 sources, the application crashs about 3 minutes later. below is error output:

cudaMemcpyAsync failed: unrecognized error code(32608)
CUDA runtime error 30 at line 1166 in file gstnvvidconv.cppCUDA runtime error 29 at line 1166 in file gstnvvidconv.cpp
free(): corrupted unsorted chunks *** Unable to set device in nvtracker_convert_buffer Line 186

I want to run nvidia-smi to get the GPU status, but I get error either :) below is error output when I run nvidia-smi:

Unable to determine the device handle for GPU 0000:01:00.0: GPU is lost. Reboot the system to recover this GPU

I have to reboot my ubuntu machine physically. So anybody know about why this happen?

solved, it is caused by no GPU fan, the temperature grows too high.

Topic		Replies	Views
DeepStream 3.0 on Tesla P4 Performance Issue Deep Learning (Training & Inference)	2	801	March 27, 2019
Temperature issue with Tesla T4 and 12 rtsp streams DeepStream SDK gstreamer	2	2249	October 12, 2021
GPU is lost using tesla P4, DeepStream 3.0, tensorRT5.0 DeepStream SDK	4	507	October 12, 2021
DeepStream GPU temperature issue (running samples) DeepStream SDK	2	900	February 8, 2018
cuGraphicsGLRegisterBuffer failed with error(219) gst_eglglessink_cuda_init texture = 1 DeepStream SDK	16	920	December 25, 2022
DeepStream not working on Azure V100 vm DeepStream SDK	6	655	October 12, 2021
GPU is lost. Reboot the system to recover this GPU DGX User Forum hw , kernel	3	5463	March 8, 2022
DS5 : deepstream-app not working on AWS: only one frame processed DeepStream SDK	11	1274	October 12, 2021
[NANO] Deepstream-app not working well for multiple video source DeepStream SDK	3	1140	October 12, 2021
Failure to run DeepStream SDK 2.0 for Tesla (GStreamer-CRITICAL) (Solved) DeepStream SDK	5	2607	September 25, 2018

Tesla P4, Heat and Performance Issue

Related topics