Using NVIDIA A100’s Multi-Instance GPU to Run Multiple Workloads in Parallel on a Single GPU (redhat.com) this article shows that when multiple gpu instances are used in palallel to run multiple identical workload, there are small degradations in performance comparing to one gpu instance running a single piece of the same workload. As MIG isolates almost all resources relative to performance, such as cores, memory, cache and even sys pipe physically, why there are still performance degradations in the linear weak scaling test case?