Multi GPU set up

brandt33 · June 20, 2022, 9:33pm

Question (also related to another topic that @mchi had replied to, but rephrased):

How does one set up Deepstream to run on 1 or 2 or 4 GPUs on a 4 GPU server? I would like to do that and benchmark our hardware providing inference responses (of the same model) using either 1, 2 or 4 GPUs simultaneously.

I believe that one should / could use Triton to do this.
I would like to do this using the command :

deepstream-app -c source1_primary_retinanet_resnet18.txt

So after pulling and running the Deepstream6.1-Triton docker (docker pull nvcr.io/nvidia/deepstream:6.1-triton) I then set up the chosen model in the triton models under samples. There is a config.pbtxt there which I use to have 2 instances on each GPU.

instance_group {
   count:  2
   kind: KIND_GPU
 }

then there are two other config files that also refer to the GPU id:
source1_primary_retinanet_resnet18.txt
config_infer_primary_retinanet_resnet18.txt

The question is how to write those two config files so they can use 1, 2 or 4 GPU devices at the same time to process incoming requests from 16 sources (for example)?

Could you please let me how to do this within those 2 config files? What sections are needed in each?

Or do I need 8 config files to accomplish this?

Thank you very much,

Brandt

mchi · June 21, 2022, 3:36am

Hi @brandt33
This is an exmaple for running bodypose2d on GPU#1 and GPU#3 on a 4 GPUs platform (GPU#0 ~ GPU#3).
And, the triton command is:

$ docker run --gpus all -it --rm --shm-size=1g --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 -p 9000:8000 -p 9001:8001 -p 9002:8002 -v $(pwd)/triton_server_yolov4_bodypose2d/models:/models nvcr.io/nvidia/tritonserver:22.02-py3 tritonserver --model-repository=/models --strict-model-config=false --grpc-infer-allocation-pool-size=16 --log-verbose=1

name: "bodypose2d"
platform: "onnxruntime_onnx"
backend: "onnxruntime"
max_batch_size: 4
input [
    {
        name: "input"
        data_type: TYPE_FP32
        dims: [
            3,
            224,
            224
        ]
    }
]
output [
    {
        name: "266"
        data_type: TYPE_FP32
        dims: [
            18,
            56,
            56
        ]
    },
    {
        name: "268",
        data_type: TYPE_FP32
        dims: [
            42,
            56,
            56
        ]
    }
]
instance_group [
    {
      count: 3
      kind: KIND_GPU
      gpus: [ 1,3 ]
    }
]

brandt33 · June 22, 2022, 1:00am

wow, thank you @mchi this is very helpful and I can hopefully use this as a basis to work from.

If I understand correctly this triton server could then serve requests for bodypose2d utilizing both GPU0 and GPU3 as needed.

This is very close to what I would like to do with RetinaNet.

Let’s say that I run a Triton server for RetinaNet in the same way and then want to send it requests. There are two ideas I’d appreciate your clarification on:

The way that I thought this would work was with the Deepstream-Triton integration and that the command I listed above

deepstream-app -c source1_primary_retinanet_resnet18.txt

would effectively do that (using those two files :
source1_primary_retinanet_resnet18.txt
config_infer_primary_retinanet_resnet18.txt

If service was available from triton server on GPUs0 and 3 would the Deepstream configuration direct the requests to a GPU based on the gpu-id in the config file? If so maybe I would have four pairs of configs, one to target each GPU with a load of requests? I could then run all four tasks at the same time to get the total loading?

OR maybe, once the server is running, a triton client (not invoked by the Deepstream command above) should send these requests?

Also, it sounds like the metamux that is under development that would bring outputs from multiple GPUs together (and split the requests to begin with) will be part of the Triton server package, is that correct? or is it going to be part of deepstream?

Once again, appreciate all of your help looking at these cases yesterday, Thanks!
Brandt

yingliu · July 19, 2022, 3:17am

Hi @brandt33 Sorry for come back late, is it still an issue or we can close this topic? thanks.

mchi · July 19, 2022, 7:53am

yes. you can use “nvidia-smi dmon” to monitor the which GPU are running.

mchi · July 19, 2022, 7:54am

yes.

yes, correct

or is it going to be part of deepstream? ==> yes, it will be part of DeepStream in future release

yingliu · August 9, 2022, 2:05am

There is no update from you for a period, assuming this is not an issue anymore.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

system · August 23, 2022, 2:26am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
No example or sample available for multi GPU DeepStream-Triton DeepStream SDK gstreamer , inference-server-triton	5	581	July 5, 2022
Running on multi-gpus DeepStream SDK	4	375	October 12, 2021
How to utilize multiple GPU in deepstream 5.0 DeepStream SDK	7	2646	October 12, 2021
How to add triton server to deepstream in different device? DeepStream SDK	10	848	September 7, 2023
Triton server configuration instance group DeepStream SDK	4	1593	March 30, 2022
Requires 8 T4 GPU cards for a single CPU DeepStream SDK	4	323	August 31, 2022
Does Deepsort supports Mulitple GPU? DeepStream SDK gstreamer	3	460	December 8, 2021
Multiple gpu in deepstream 6.0 DeepStream SDK	6	524	June 28, 2022
Multiple model instance DeepStream SDK	2	273	December 4, 2023
Deepstream with standalone triton server DeepStream SDK	4	917	November 1, 2021

Multi GPU set up

Related topics