Temperature issue with Tesla T4 and 12 rtsp streams

Hardware Platform (Jetson / GPU) GPU (Tesla T4)
• DeepStream Version 5.0
• TensorRT Version 7.2.1
• NVIDIA GPU Driver Version (valid for GPU only) 460.32
• Issue Type( questions, new requirements, bugs) Question
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)

Run deepstream app with yoloV4 model with 12 camera rtsp streams.

• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

I am using Tesla T4, Running Deepstream with YoloV4 (80 classes).
Camera streams - 12 rtsp (1080p @25FPS).
Batch-size = 12 for both Streamux and primary-gie.
I want to check the load T4 can handle.

Here are few observations-

  1. When i use “interval=0” in [primary-gie] group, then it does not run well, fps drops down to 2fps and lots of frame glitch appears. I am not sure if model is heavy and lots of frames incoming causing buffering issue or something else?

  2. I updated “interval=12” in [primary-gie] group, (skipping 12 batches) then it ran well. Output fps =25, with some bbox trailing for detection(because of batch-skip).
    Decoder load is in range of 35-40 %.
    GPU load in in range of 50-60%.

  3. After 10 mins of run, nvidia-smi gives temp>=85 degrees,
    Then all of sudden, GPU goes 100%, Decoder goes to 0% and then glitch appears. After that suddenly temps goes <85 degrees and everything runs fine for few seconds. And then this process repeats in every 10-15 seconds.

I am not sure if it is related to temperature? What should be the optimum temperature for T4 to work fine? I have kept the system in well mantained cool environment.

Hey, the clock will be throttle if temperature is high, you can use following command to check the GPU hardware info. Per you description, seems your host machine’s capability is not enough to run a T4, can you check with your T4 vendor about the lowest requirement of host machine to run the T4.

$ nvidia-smi --format=csv -lms 20 --query-gpu=index,timestamp,utilization.gpu,utilization.memory,memory.total,memory.free,memory.used,power.limit,fan.speed,compute_mode,clocks.current.graphics,clocks.current.sm,clocks.current.memory,clocks.current.video,gpu_operation_mode.current,,clocks_throttle_reasons.active,power.draw,clocks.gr,temperature.gpu,pstate,clocks_throttle_reasons.hw_slowdown,clocks_throttle_reasons.gpu_idle,clocks_throttle_reasons.applications_clocks_setting,clocks_throttle_reasons.sw_power_cap,clocks_throttle_reasons.sync_boost -i $GPU_ID | tee log.csv