K1 - K160Q high frame buffer utilization

Hi,

I’m currently setting up monitoring of a Citrix environment running on XenServer.
They are using K1 cards with a K160Q profile.

In our monitoring solution I see that the Frame buffer utilization is 99,4% over a long period of time.
Monitoring is running now for a week and it is a steady line since the start.
I would presume that during off-office-hours this would go down (again presuming that it is normal during work hours).

Is this percentage normal or a bit over the top? Could this mean that the profile chosen is not powerfull enough and should they move up one more? Trying to understand the reasons behind the number.

What I see further more is that ten xen tools are out of date so perhaps that is an issue was well, not that much experience with xenserver so don’t know the effect it has on performance.

any thoughts are welcome
frame.png

What are you using for monitoring? I would suggest testing some monitoring with NVIDIA smi:

http://developer.download.nvidia.com/compute/cuda/6_0/rel/gdk/nvidia-smi.331.38.pdf

This would at least give you something to compare against to validate your readings. I would suspect that you’re not maxing out the frame buffer all the time.

If you’re monitoring at the hypervisor (e.g. in XenCenter) then that’s what I’d expect to see as the FB is allocated as a fixed block at VM start.

If you’re looking for usage by the VM, then you should measure inside the VM.

We’re using eG Innovations for monitoring, we monitor the host.
The reason I ask is because the performance is top so I was surprised by the metrics we saw.

We don’t monitor with an agent in the desktop so we get the metrics from the host only.
Will look at validating the numbers

Thanks for the thoughts
Greetings
Rob

Hi Rob,

There’s some useful info on FB monitoring in this article: http://nvidia.custhelp.com/app/answers/detail/a_id/4108/kw/framebuffer

remeber this is physical FB and the VM can cope with applications that demand more by a kind of paging, too much of this slows things down but there’s a margin for comfort that means even if the apps are demandign a bit more FB than available nothing bad will happen… you could also find other resources are exhausted before GPU ones (e.g. CPU) and/or changing the profile could lead to underloading your CPU capacity… if the user experience is good that a pretty good indication you haven’t overprovisioned too much :-D

Rachel

This is why you see what you see,

You need to monitor the hosts usage inside the host.