I have 3 Tesla M10 GPUs on three Cisco UCS hosts. They are being used in one pool in Horizon 7 across 80+ separate VM’s. I have been tasked with seeing how much the GPU’s are being utilized. When I run nvidia-smi it show’s everything at 100% but I have a hard time believing that considering many of the VM’s are not currently running. Is there something I’m missing?
Where do you run nvidia-smi? Host or guest? Which profile are you using?
I don’t believe that all of your GPUs are running at 100% all the time…
Have you tried installing and running the nice free utility, GPUprofiler ?
As you’re running VMware, if you look in the GRID Software Portal (where you doenloaded the software from), under Product Information > 5.2 > … There’s an “NVIDIA Virtual GPU Management Pack for vRealize Operations”.
This will give you some great monitoring capabilities if you’re running vROPs. Otherwise as Tobias mentions, GPUProfiler on GitHub is very good as well!
Have you figured this one out? I am having problems with my P40 cards doing the same thing.
Let me guess you are running PCoIP with Horizon?
This is a known issue from VMWare and open for 3 years now. They just won’t fix it as they seem to not invest in PCoIP any more.
Yes, we are using PCoIP. Sounds like that entry in those release notes you linked could be the issue.