Architectural insights needed: Why is the MIG 3g.71gb instance consistently the "Efficiency Sweet Spot" on H200?

linzheng0428 · December 15, 2025, 7:37am

Interesting observation, and your cost metric makes sense for scheduling decisions.

From what we’ve seen in practice, the 3g behavior is often less about raw slice ratios
and more about where different bottlenecks start to saturate.

For many small–medium batch workloads, 3g tends to hit a “balanced” point:
memory bandwidth per SM stays high, L2 is still effective, and launch gaps remain small.
Going from 3g to 4g adds compute faster than it adds effective bandwidth or cache benefit,
so efficiency drops even if peak throughput improves.

That also lines up with your offload case — once the working set spills beyond 3g memory,
the curve changes completely and full GPU becomes the only sensible option.

Curious if you see any clear differences in memory pipe utilization or idle gaps
between 3g and 4g in Nsight.

Topic		Replies	Views
MIG performance CUDA Programming and Performance	15	1310	November 28, 2024
Dividing NVIDIA A30 GPUs and Conquering Multiple Workloads Technical Blog	0	390	August 30, 2022
Getting the Most Out of the NVIDIA A100 GPU with Multi-Instance GPU Technical Blog	11	1752	January 19, 2023
MIG Instances Utilization Calculation General Discussion cuda , kernel	0	118	August 4, 2025
Latency of workload in MIG slice vs full GPU CUDA Programming and Performance	5	335	July 16, 2025
H100 mig profile - available memory CUDA Setup and Installation	0	239	April 29, 2025
ISC20 Featured Demo: Boosting Performance and Utilization with Multi-Instance GPU Technical Blog	0	307	August 21, 2022
Multi Instance GPU (MIG) mode and Performance CUDA Programming and Performance	2	939	August 1, 2022
What's the possible reason of the performance drop when using MIG mode of A100? NGC GPU Cloud	0	641	February 15, 2023
Wmma vs Wgmma On H100 GPU CUDA Programming and Performance cublas	5	226	December 15, 2025

Architectural insights needed: Why is the MIG 3g.71gb instance consistently the "Efficiency Sweet Spot" on H200?

Related topics