Hello everybody,
We have a server (Supermicro AS -2124GQ-NART) with four Tesla A100 and we integrated the server in our opennebula environment.
We pass the GPU via PCI-Passthrough as descirbed here (PCI Passthrough — OpenNebula 5.12.12 documentation) to the vm.
Unfortunatly there is a scenario where the GPUs are extremly slow, when mapping one,two or three GPUs to one VM.
When the remaining GPU is already in use our vm takes a very long time to load some data to the GPU.
We tested this behaviour in different compositions, here is one of them:
Two VMs, both got 2 GPUs and we loaded zeros to a gpu with pytorch:
cat test.py
import torch
torch.zeros(100).to(‘cuda:0’)
which takes a fair amount of time:
time python test.py
python test.py 2.12s user 1.02s system 99% cpu 3.145 total
strace -c python test.py
% time seconds usecs/call calls errors syscall
62.97 0.112609 28 4092 516 ioctl
17.84 0.031907 5 6099 brk
When we switch to the second vm, the initialization of our gpu is extremly slow:
time python test.py
python test.py 2.33s user 543.02s system 68% cpu 13:13.97 total
strace -c python test.py
% time seconds usecs/call calls errors syscall
99.98 541.038915 132219 4092 516 ioctl
0.01 0.042287 7 6104 brk
Did we miss something? We use the same setup with a different server wich uses some older Tesla GPUs (V100 + K40) and it works just fine.
When we pass all four GPUs into one VM there seems to be no problem.
Can anyone help?