DeepStream GPU temperature issue (running samples)

My environment is DeepStream 1.0 with P4 (Driver 384.111, CUDA8.0, TRT2.1, CUDNN 6.0)

When I run the default samples for about 5 minutes, All three samples go halt and system reboot.
Running sample status is good.

But GPU temperature start from 47C and heating up to 92~94C.
then error occurs and system go halt and rebooting.

I modify file provider to slow down the feeding.

  • Add sleep 200 ms per frame. 5 FPS feeding. and run only 1 channel.

In this case, heating speed is smaller then before. but go to 92~94C in end. too.

I conclude that somethig GPU heating module running in Decoder or Inference module,
And there is no stop module in GPU temperature.(Driver problem?)

I checked memory leakage status or GPU usages in nvidia-smi but GPU usage about 5~10% for all time, No memory leak.

Device also have NVIDIA Quadro K2000 GPU for display port.
So I Run the sample on this device ID.
In this case, GPU usage up to 90% for deepstream sample but no rebooting and stable. long-run test is good.

What is problem in my P4 device? (P4 needs custom cooling fan?)

log when the GPU temperature at 93~94C and system halt

[DEBUG][17:36:48] Video[0] Decoding Performance: 4.94 frames/second || Total Frames: 1000
[DEBUG][17:36:48] Analysis Pipeline Performance: 4.89 frames/second || Total Frames: 1000
[DEBUG][17:37:09] Video[0] Decoding Performance: 4.99 frames/second || Total Frames: 1100
[DEBUG][17:37:29] Video[0] Decoding Performance: 4.89 frames/second || Total Frames: 1200
[DEBUG][17:37:29] Analysis Pipeline Performance: 4.94 frames/second || Total Frames: 1200
[DEBUG][17:37:49] Video[0] Decoding Performance: 4.99 frames/second || Total Frames: 1300
[DEBUG][17:38:09] Video[0] Decoding Performance: 4.93 frames/second || Total Frames: 1400
[DEBUG][17:38:09] Analysis Pipeline Performance: 4.96 frames/second || Total Frames: 1400
[DEBUG][17:38:30] Video[0] Decoding Performance: 4.94 frames/second || Total Frames: 1500
[DEBUG][17:38:50] Video[0] Decoding Performance: 4.98 frames/second || Total Frames: 1600
[DEBUG][17:38:50] Analysis Pipeline Performance: 4.96 frames/second || Total Frames: 1600
[DEBUG][17:39:10] Video[0] Decoding Performance: 4.89 frames/second || Total Frames: 1700
[DEBUG][17:39:30] Video[0] Decoding Performance: 4.99 frames/second || Total Frames: 1800
[DEBUG][17:39:30] Analysis Pipeline Performance: 4.94 frames/second || Total Frames: 1800
[DEBUG][17:39:51] Video[0] Decoding Performance: 4.79 frames/second || Total Frames: 1900
[ERROR][17:40:06] CUDA error 999 at line 220 in file src/nvDecLite.cpp
[ERROR][17:40:06] CUDA error 999 at line 225 in file src/nvDecLite.cpp
[ERROR][17:40:06] CUDA runtime error 30 at line 38 in file src/framePool.cpp
[ERROR][17:40:06] CUDA runtime error 30 at line 114 in file src/frameAnalysis.cpp
[ERROR][17:40:06] CUDA runtime error 30 at line 142 in file src/frameAnalysis.cpp
[ERROR][17:40:06] CUDA runtime error 30 at line 288 in file src/cudaImage.cu
[ERROR][17:40:06] CUDA runtime error 30 at line 295 in file src/cudaImage.cu
[ERROR][17:40:06] CUDA runtime error 30 at line 297 in file src/cudaImage.cu
[ERROR][17:40:06] CUDA runtime error 30 at line 189 in file src/nvInferLite.cpp
scaleWeights.cu (1025) - Cuda Error in NCHWToNCQHW4: 30
[ERROR][17:40:06] CUDA runtime error 30 at line 193 in file src/nvInferLite.cpp
[ERROR][17:40:06] CUDA runtime error 30 at line 156 in file src/core.h
[ERROR][17:40:06] CUDA runtime error 30 at line 156 in file src/core.h
[ERROR][17:40:06] CUDA runtime error 30 at line 209 in file src/frameAnalysis.cpp
[ERROR][17:40:07] CUDA error 999 at line 220 in file src/nvDecLite.cpp
[ERROR][17:40:07] CUDA error 999 at line 225 in file src/nvDecLite.cpp
[ERROR][17:40:07] CUDA runtime error 30 at line 38 in file src/framePool.cpp
[ERROR][17:40:07] CUDA runtime error 30 at line 114 in file src/frameAnalysis.cpp
[ERROR][17:40:07] CUDA runtime error 30 at line 142 in file src/frameAnalysis.cpp
[ERROR][17:40:07] CUDA runtime error 30 at line 288 in file src/cudaImage.cu
[ERROR][17:40:07] CUDA runtime error 30 at line 295 in file src/cudaImage.cu
[ERROR][17:40:07] CUDA runtime error 30 at line 297 in file src/cudaImage.cu
[ERROR][17:40:07] CUDA runtime error 30 at line 189 in file src/nvInferLite.cpp
scaleWeights.cu (1025) - Cuda Error in NCHWToNCQHW4: 30
[ERROR][17:40:07] CUDA runtime error 30 at line 193 in file src/nvInferLite.cpp
[ERROR][17:40:07] CUDA runtime error 30 at line 156 in file src/core.h
[ERROR][17:40:07] CUDA runtime error 30 at line 156 in file src/core.h
[ERROR][17:40:07] CUDA runtime error 30 at line 209 in file src/frameAnalysis.cpp
[ERROR][17:40:07] CUDA error 999 at line 220 in file src/nvDecLite.cpp
[ERROR][17:40:07] CUDA error 999 at line 225 in file src/nvDecLite.cpp
[ERROR][17:40:07] CUDA runtime error 30 at line 38 in file src/framePool.cpp
[ERROR][17:40:07] CUDA runtime error 30 at line 114 in file src/frameAnalysis.cpp
[ERROR][17:40:07] CUDA runtime error 30 at line 142 in file src/frameAnalysis.cpp
[ERROR][17:40:07] CUDA runtime error 30 at line 288 in file src/cudaImage.cu
[ERROR][17:40:07] CUDA runtime error 30 at line 295 in file src/cudaImage.cu
[ERROR][17:40:07] CUDA runtime error 30 at line 297 in file src/cudaImage.cu
[ERROR][17:40:07] CUDA runtime error 30 at line 189 in file src/nvInferLite.cpp
scaleWeights.cu (1025) - Cuda Error in NCHWToNCQHW4: 30
[ERROR][17:40:07] CUDA runtime error 30 at line 193 in file src/nvInferLite.cpp
[ERROR][17:40:07] CUDA runtime error 30 at line 156 in file src/core.h
[ERROR][17:40:07] CUDA runtime error 30 at line 156 in file src/core.h
[ERROR][17:40:07] CUDA runtime error 30 at line 209 in file src/frameAnalysis.cpp
[ERROR][17:40:18] CUDA error 999 at line 220 in file src/nvDecLite.cpp
[ERROR][17:40:18] CUDA error 999 at line 225 in file src/nvDecLite.cpp
[ERROR][17:40:18] CUDA runtime error 30 at line 38 in file src/framePool.cpp
[ERROR][17:40:18] CUDA runtime error 30 at line 114 in file src/frameAnalysis.cpp
[ERROR][17:40:18] CUDA runtime error 30 at line 142 in file src/frameAnalysis.cpp
[ERROR][17:40:18] CUDA runtime error 30 at line 288 in file src/cudaImage.cu
[ERROR][17:40:18] CUDA runtime error 30 at line 295 in file src/cudaImage.cu
[ERROR][17:40:18] CUDA runtime error 30 at line 297 in file src/cudaImage.cu
[ERROR][17:40:18] CUDA runtime error 30 at line 189 in file src/nvInferLite.cpp
scaleWeights.cu (1025) - Cuda Error in NCHWToNCQHW4: 30
[ERROR][17:40:18] CUDA runtime error 30 at line 193 in file src/nvInferLite.cpp
[ERROR][17:40:18] CUDA runtime error 30 at line 156 in file src/core.h
[ERROR][17:40:18] CUDA runtime error 30 at line 156 in file src/core.h
[ERROR][17:40:18] CUDA runtime error 30 at line 209 in file src/frameAnalysis.cpp

Thank you.

Hi,

There are some automatically protection mechanism to avoid GPU overheat.
Please keep GPUs in a well-ventilated environment to prevent the error.

Thanks.

Yes. you are right.

I checked Desktop cooling fan speed and room temporature.
That is reason of the problem.

Cause P4 has no fan, Extnernal environments are more significant.

Thank you for help!