Multi Instance GPU (MIG) mode and Performance

hemant.hbti · August 1, 2022, 7:34pm

Hi,
I am experimenting with MIG Mode on A100 40 GB GPU.
GI Profile : MIG 3g.20gb (Profile ID 9)
Tested with BERT base model over TensorRT.

I am observing considerable increase in latency,

Is increase in latency expected in MIG mode?
Are there any suggestions/best practices for using MIG Mode ?

Robert_Crovella · August 1, 2022, 8:07pm

BERT is a model that could be complex enough that it saturates the A100 (without MIG). If that is the case, then switching inference to a MIG instance that is basically 1/2 of an A100 could result in longer processing time and therefore longer latency.

No latency increase is expected simply due to the usage of MIG, or not. But if the MIG instance you select cannot process the inference request in the same amount of time, then latency will increase.

For example, I would expect very little latency difference in doing a single RN50 (batch size 1) inference on a “full” A100 vs. a MIG “instance” of A100. But for other more complex models there may be differences.

There is a MIG user guide available. Detailed TRT questions should be asked on the TRT forum.

You may also wish to review this for best practices, which will require sign-up/log-in.

system · August 24, 2022, 3:01am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
MIG performance CUDA Programming and Performance	15	351	November 28, 2024
Getting the Most Out of the NVIDIA A100 GPU with Multi-Instance GPU Technical Blog	11	1483	January 19, 2023
Issue while running ONNXRT with MIG (Multi Instance GPU) mode CUDA Programming and Performance	4	1874	August 13, 2022
Latency linearly increases when increased batch size or concurrent models TensorRT inference-server-triton	15	2040	September 29, 2021
the latency time is linearly increasing when concurrent threads increase more than 2 TensorRT	6	1280	March 15, 2019
Optimization using Inference batch size General Topics and Other SDKs	1	1019	January 19, 2022
Latency linearly increases when increased batch size or concurrent models Tensorrt Triton Inference Server - archived tensorrt	3	1801	October 1, 2021
Is there a plan to support MiG on Orin AGX？ Jetson AGX Orin tensorrt	4	54	March 17, 2025
A100 graphics card inference performance is not strong TensorRT	4	563	April 12, 2022
Inference slow even using TensorRT Jetson AGX Orin tensorrt	15	1740	November 6, 2023

Multi Instance GPU (MIG) mode and Performance

Related topics