Architectural insights needed: Why is the MIG 3g.71gb instance consistently the "Efficiency Sweet Spot" on H200?

Interesting observation, and your cost metric makes sense for scheduling decisions.

From what we’ve seen in practice, the 3g behavior is often less about raw slice ratios
and more about where different bottlenecks start to saturate.

For many small–medium batch workloads, 3g tends to hit a “balanced” point:
memory bandwidth per SM stays high, L2 is still effective, and launch gaps remain small.
Going from 3g to 4g adds compute faster than it adds effective bandwidth or cache benefit,
so efficiency drops even if peak throughput improves.

That also lines up with your offload case — once the working set spills beyond 3g memory,
the curve changes completely and full GPU becomes the only sensible option.

Curious if you see any clear differences in memory pipe utilization or idle gaps
between 3g and 4g in Nsight.

1 Like