GPU is lost using tesla P4, DeepStream 3.0, tensorRT5.0

Hi,
I replace the sample video file sample_720p.mp4 with another mp4 file(also 720p), the new video is about 12 minutes.

Then I run deepstream-app with the new video file(1 source element and num-sources is 2 in config file), the application crashs about 3 minutes later. below is error output:


cudaMemcpyAsync failed: unrecognized error code(32608)
CUDA runtime error 30 at line 1166 in file gstnvvidconv.cppCUDA runtime error 29 at line 1166 in file gstnvvidconv.cpp
free(): corrupted unsorted chunks *** Unable to set device in nvtracker_convert_buffer Line 186


I want to run nvidia-smi to get the GPU status, but I get error either :) below is error output when I run nvidia-smi:


Unable to determine the device handle for GPU 0000:01:00.0: GPU is lost. Reboot the system to recover this GPU


I have to reboot my ubuntu machine physically. So anybody know about why this happen? thanks.

I can repro the problem many times,it is solid repro.

I found the root cause…

My P4 GPU is installed on PC and no GPU fans, the temperature grows to 90° very soon. After I add a fan to P4 card, everything is ok now .

meet the same problem. How to add a fan to P4 card? where can I buy the fan?