GPU Utilization/Performace throttled in PyDS

• Hardware Platform : GPU
• DeepStream Version : 5.0
• NVIDIA GPU Driver Version : 440.64

I’m trying python deepstream-ssd example (it uses triton for infer), on deepstream-5 docker image, container is spawed with recommend config(shm, ulimit etc)

I added multi-rtsp support to it, but the gpu core usage(not memory utilization) doesn’t go above 40%, eventually the app will crash cause of memory when channels are too many.
Sometimes util even goes down a couple percent on avg when channels are increased.

Tried all the recommendation settings in troubleshooting section : increasing buffer-surface, set sink to sync=0, gave them all gpu-id=0

I also removed all the plugins after pgie to get clarity and tried changing memory allocation of model from triton config file.

I also read triton’s optimization guide and added dynamic-batching and tensorrt acceleration, however I wasn’t able to change no of instance from 1->2 as the app would stop running,

I have multi-gpu config, other gpus also automatically occupy ~600MB when app starts although there’s no utilization in them

I’m attaching the required two tars, one is code and second is triton model with config.

Command : python3 <no_of_copies_to_make_from_the_url>


@zhliunycm2 @DaneLLL @mchi @AastaLLL

Edit : clarified meaning of utilization

Please try changing tf_gpu_memory_fraction as suggested in another thread.

@zhliunycm2 I’ve tried that too, made it to occupy as far as 0.7 of my 15GB GPU, utilization was still throttled to 40% .
Edit : By utilisation I mean gpu core utilisation, not memory utilisation
Also, the link is broken I guess

Can you try lowering tf_gpu_memory_fraction to 0.4? Memory could be the bottleneck here.

I’ve tried : 0.3, 0.4, 0.6, 0.7. No effect.
I’ve attached the minimal code too, in case you could try

@zhliunycm2 Also there was this env variable mentioned in troubleshoot guide that apparently enables the latency measurement for all plugins, didn’t show anything on the python app though.

@zhliunycm2 Any update/insight?

We are investigating. To confirm:

  • Are you able to run a single stream without problems?
  • Are you running docker with a single GPU? If not already, please try “docker run --gpus device=0”.
  • What’s the max number of streams you can run before running into problem? You mentioned going from 1->2 caused app to stop.
  • You are seeing this with decode->streammux->pgie only pipeline? You mentioned removing all plugins after pgie.
  • Do you see this behavior with C version of the app (deepstream-app with trition config)?

It would also help if you can post the error messages. Thanks!