Horizon - Video playback performance

Hello,

I have performance problems with playback of videos in my VDI machines. I’m using Blast Extreme accelerated by a Nvidia P40 GPU. Playback is always a little bit stuttering. I usually get a better result from my Linux VDI pool that isn’t GPU accelerated. If I check with youtube the problem seems to be related with encoding since I don’t se dropped frames reported by the video player.

What is your experience in fullscreen video playback with the P series?

Cristiano

Hi,

I’m having no issues and super smooth playback even with 4k and Youtube with P40. Which browser are you using?
If you would using a Maxwell board I would assume the issue is related to decoding but Pascal already supports VP9 hardware decoding (used from Youtube) so that this cannot be the issue.
What OS and resolution are you using?

Regards

Simon

Hello Simon,

I’m using W10 64 1709 with 3 cpu and 5GB of Ram, storage is all flash. The profile is just a P40-1Q since I’m going for density. I’ve tried with edge firefox and chrome, aeme issue. Decoding the video from youtube shouldn’t be the issue, I don’t see dropped frames. The issue is not just youtube, every video app is a little bit jerky. I also tried the Nvidia lab, that is available with registration and playback is smooth there. I have also an SR open with VMware, so I’m working on multiple fronts.

regards

Cristiano

Hi Cristiano,

could you please change the scheduler from default (equal share) to best effort for your deployment and test once again? I’m curious to hear if it makes any difference.

http://docs.nvidia.com/grid/5.0/grid-vgpu-user-guide/index.html#changing-vgpu-scheduling-policy-all-gpus

regards

Simon

Hello Simon,

Wen I will upgrade to 5.2 I will try that, since it requires a restart. Right now is difficult for me to find a windows for downtime

Regards

Cristiano

Hello Simon,

didn’t switch mode, but GPU wise it seems ok, those are the statistics:

GPU Session Process Codec H V Average Average

Idx Id Id Type Res Res FPS Latency(us)

1      11     996   H.264    1710    1008      92        4469
1      11     996   H.264    1710    1008      77        5875
1      11     996   H.264    1710    1008      82        4556
1      11     996   H.264    1710    1008      73        4962
1      11     996   H.264    1710    1008      71        4394
1      11     996   H.264    1710    1008      80        4421
1      11     996   H.264    1710    1008      82        4129
1      11     996   H.264    1710    1008      93        3806
1      11     996   H.264    1710    1008      83        4308
1      11     996   H.264    1710    1008      76        4450

so it seems that encoding is fine. Are 80/90 FPS not a little too much?
Also decoding should be fine as youtube doesn’t report dropped frames.

Regards

Cristiano

Hi Cristiano,

it depends on the scheduler. As default scheduler has no FRL it renders more than 60fps.
You can switch to best effort to have FRL=60fps which is more than sufficient. And I agree that the GPU cannot be the issue in your case as we render more than enough frames :)

Regards

Simon

I have the same problem many years (gracefully ignored by Nvidia). It is not problem of video decoder (nvdec) or 3d renderer (DX/OpenGL) but the problem is in interaction between “Compositing window manager” (DWM.exe) (with composer like AERO or newer) and public NVidia Capture SDK. It simple does not trigger “new frame” event in public NVidia Capture SDK (function like NvFBCToSysGrabFrame()) and rendered frame is lost.
I do not known which version of public or NDA NVidia Capture SDK is used in your Blast Extreme version. I have not any usable version of public NVidia Capture SDK because K1/K2/K520/K340 are not supported any more. Check FPS problems (also in video) - https://gridforums.nvidia.com/default/topic/1149/#4247.

Try using the GRID 5.1 drivers instead.

I spotted better results in latest GRID release if I set parameter “sw_vsync_enabled=1” for virtual machine (see https://gridforums.nvidia.com/default/topic/258/). Try it …

I`m experiencing the same problem that Christiano mentioned before.

We got Tesla P40 GPU`s in an Horizon environment in version 7.4.0 on ESXi Hosts (HPE DL380G10) in version 6.5U1.
I´ve already switched the scheduler to best effort, but did not see a difference.
I am using 2 full hd monitors and the blast extreme protocol.
The vGPU profile i am using is "grid_p40-2q".

This is the output i get from the hypervisor shell when i run "nvidia-smi vgpu -es":
This output was generated while playing a YouTube video through firefox.
At the first "7 fps" lines the video was not startet, it startet later when those fps counters rose.
It seems like the GPU is rendering way too much frames, but i thought it is limited due to the best effort scheduler…

GPU vGPU Session Process Codec H V Average Average

Idx Id Id Id Type Res Res FPS Latency(us)

0     230381      34       2264   H.264    1920    1080       7         333
0     230381      35       2264   H.264    1920    1080       7         443
0     230381      34       2264   H.264    1920    1080       7         333
0     230381      35       2264   H.264    1920    1080       7         339
0     230381      34       2264   H.264    1920    1080       7         332
0     230381      35       2264   H.264    1920    1080       7         348
0     230381      34       2264   H.264    1920    1080      15         958
0     230381      35       2264   H.264    1920    1080       8         838
0     230381      34       2264   H.264    1920    1080     126         735
0     230381      35       2264   H.264    1920    1080      43        1253
0     230381      34       2264   H.264    1920    1080    1188         646
0     230381      35       2264   H.264    1920    1080      30         516
0     230381      34       2264   H.264    1920    1080     122         706
0     230381      35       2264   H.264    1920    1080      27         475
0     230381      34       2264   H.264    1920    1080     108        1160
0     230381      35       2264   H.264    1920    1080      12         473
0     230381      34       2264   H.264    1920    1080      78        1234
0     230381      35       2264   H.264    1920    1080       7         279
0     230381      34       2264   H.264    1920    1080      58        1167
0     230381      35       2264   H.264    1920    1080       7         229

Any advice on how to fix this?

Hi prinz,

could you run "nvidia-smi encodersessions" within the VM?

For me it works as expected with P40-2Q and best effort playing a FullHD FULL screen youtube video:

GPU Session Process Codec H V Average Average

Idx Id Id Type Res Res FPS Latency(us)

0       6       1516   H.264    1920    1200      59        2664
0       6       1516   H.264    1920    1200      37        2762
0       6       1516   H.264    1920    1200      56        2965
0       6       1516   H.264    1920    1200      58        2887
0       6       1516   H.264    1920    1200      60        2851
0       6       1516   H.264    1920    1200      59        3063
0       6       1516   H.264    1920    1200      60        2993
0       6       1516   H.264    1920    1200      60        3104

Regards

Simon

I wonder about the effectiveness of the 1Q profile for full 4K video. I’ve been looking at the difference between P6-1Q, P6-2Q and P6-4Q profiles and video performance. 4Q is super smooth, but with 2Q there is roughly a 25% drop in performance and with a 1Q profile a further 25% drop from there.

I have default scheduler settings and have put this down to simple frame buffer capacity but something doesn’t sit right with me in that the software encoding engine (Blast in this case) is still doing the same number of pixels (the same video is being used) so why should the frame buffer make such a difference?

Hello Simon,

i ran the command "nvidia-smi encodersessions" within the VM while playing a YouTube 1080p video and got that output:

GPU Session Process Codec H V Average Average

Idx Id Id Type Res Res FPS Latency(us)

0        9        1056   H.264    1920    1080       0           0
0       10       1056   H.264    1920    1080       31          840
0        9        1056   H.264    1920    1080       0           0
0       10       1056   H.264    1920    1080       35          768
0        9        1056   H.264    1920    1080       0           0
0       10       1056   H.264    1920    1080       31          805
0        9        1056   H.264    1920    1080       696         1232
0       10       1056   H.264    1920    1080       31          703
0        9        1056   H.264    1920    1080       36          741
0        9        1056   H.264    1920    1080       143         1190
0       10       1056   H.264    1920    1080       32          708
0        9        1056   H.264    1920    1080       722         1113
0       10       1056   H.264    1920    1080       32          750
0        9        1056   H.264    1920    1080       119         1137
0       10       1056   H.264    1920    1080       36          781
0        9        1056   H.264    1920    1080       145         1144
0       10       1056   H.264    1920    1080       38          811
0        9        1056   H.264    1920    1080       116         1170
0       10       1056   H.264    1920    1080       34          723

If i connect to the desktop-pool with only one active monitor and watch the exact same video in fullscreen, i get that output:

GPU Session Process Codec H V Average Average

Idx Id Id Type Res Res FPS Latency(us)

0       7      13076   H.264    1920    1080     740        1339
0       7      13076   H.264    1920    1080     749        1324
0       7      13076   H.264    1920    1080       0           0
0       7      13076   H.264    1920    1080     110         999
0       7      13076   H.264    1920    1080     130        1201
0       7      13076   H.264    1920    1080       0           0
0       7      13076   H.264    1920    1080       0           0
0       7      13076   H.264    1920    1080       0           0
0       7      13076   H.264    1920    1080       0           0
0       7      13076   H.264    1920    1080       0           0
0       7      13076   H.264    1920    1080       0           0
0       7      13076   H.264    1920    1080       0           0
0       7      13076   H.264    1920    1080       0           0
0       7      13076   H.264    1920    1080       0           0
0       7      13076   H.264    1920    1080       0           0
0       7      13076   H.264    1920    1080     121        1111
0       7      13076   H.264    1920    1080     256        3903
0       7      13076   H.264    1920    1080       0           0
0       7      13076   H.264    1920    1080       0           0
0       7      13076   H.264    1920    1080       0           0

On the hypervisor it looks like this when i play the video with one active monitor:

GPU vGPU Session Process Codec H V Average Average

Idx Id Id Id Type Res Res FPS Latency(us)

0     855153      17      13076   H.264    1920    1080     115        1469
0     855153      17      13076   H.264    1920    1080     115        1372
0     855153      17      13076   H.264    1920    1080     144        6922
0     855153      17      13076   H.264    1920    1080     128        1252
0     855153      17      13076   H.264    1920    1080     120        1168
0     855153      17      13076   H.264    1920    1080     116        1657
0     855153      17      13076   H.264    1920    1080    1736         135
0     855153      17      13076   H.264    1920    1080     115        1240
0     855153      17      13076   H.264    1920    1080     122        1337
0     855153      17      13076   H.264    1920    1080     124        1346

If i run "nvidia-smi vgpu -q" on the hypervisor, I can see the line "Frame Rate Limit: 60FPS", but it does not limit anything if i get this right…

Regards,
Dominik

Hi Dominik,

I agree this looks like you don’t have FRL in place. Could you please double check that you’re running Best Effort scheduler? Try the latest GRID6.1 package and you should have Best Effort by default…

Regards

Simon

Hi Simon,

so I ran this command on the hypervisor to set the best effort scheduler and rebooted the hosts:
esxcli system module parameters set -m nvidia -p "NVreg_RegistryDwords=RmPVMRL=0x00"

But I dont see any difference. The frame rates are still way to high.
The question is, on which layer the problem occurs first.
It can be the hypervisor, the VM config, the Guest OS or the NVIDIA driver on the OS.
So i set the preference on the lowest layer, which should be fine.
I read the installation manual multiple times and I cant figure out any config errors done by me.
I have to correct that we are now on ESXi 6.5U2, maybe there is a known bug or problem?
We are on the latest GRID 6.1 software (host driver version: 390.57).

UPDATE: I did further investigation and noticed a very interesting benchmark behaviour. This was tested with the Unigine Valley Benchmark. While the benchmark is running, the FPS-Counter in the application tells, that you are almost fixed at 67/68 sometimes 70FPS, while the hypervisor (nvidia-smi vgpu -es) tells you this:

GPU vGPU Session Process Codec H V Average Average

Idx Id Id Id Type Res Res FPS Latency(us)

0     311288       7       2256   H.264    1920    1080     140         901
0     311288       8       2256   H.264    1920    1080       7         161
0     311288       7       2256   H.264    1920    1080     102         984
0     311288       8       2256   H.264    1920    1080       7         303
0     311288       7       2256   H.264    1920    1080      94         785
0     311288       8       2256   H.264    1920    1080       7         202
0     311288       7       2256   H.264    1920    1080     104         773
0     311288       8       2256   H.264    1920    1080       7         673
0     311288       7       2256   H.264    1920    1080     103         875
0     311288       8       2256   H.264    1920    1080       7         333
0     311288       7       2256   H.264    1920    1080     107         823
0     311288       8       2256   H.264    1920    1080       7         154
0     311288       7       2256   H.264    1920    1080     125         792
0     311288       8       2256   H.264    1920    1080       7         421
0     311288       7       2256   H.264    1920    1080     125         790
0     311288       8       2256   H.264    1920    1080       7        1096
0     311288       7       2256   H.264    1920    1080      87         759
0     311288       8       2256   H.264    1920    1080    2010         152
0     311288       7       2256   H.264    1920    1080     109         835
0     311288       8       2256   H.264    1920    1080       7         184

The benchmark feels like the frames are dropping very hard, but the FPS-Counter and the hypervisor tells you, that there are no frame-drops at all. Where/what can be the cause of this odd behaviour?

ANOTHER UPDATE: I imported the View GPOs and set the MaxFPS-Blast GPO to 60 fps. It feels much much smoother now, but the frame rates are still pretty high and you can feel some sort of "lag" if you move windows quick or even in some video sequences, but no comparison to before (Max 30 FPS).

So I think the main problem is found and stupid simple.
But for the fine tuning and the problem with the high fps I would be very pleased if I get further help here.

Thank you for your helpful advice and effort!

Regards,
Dominik

What are the min QP and max QP settings in the Blast Extreme session?
And are you using UDP or TCP?

The QP settings are the default ones. We tried UDP and TCP, but there is no noticeable difference. I dont think, that this problem is a Blast setting. My guess is, that the frame rate limiter on the hypervisor is not working correctly. Blast displays only 60FPS due to the GPO I set, but the hypervisor renders way more frames than this. This is very bad for the user experience and of course also the user density, since one user could claim nearly all the performance of that physical GPU.

Hi Prinz,

I disagree. Let’s discuss this offline. Please send me a PM.

regards

Simon

I’m having the same results as others on this forum with Horizon blast (7.7) stuttering playback with H264 encoding enabled. We are using P40 cards, using 4Q and grid driver 412.16. Was this ever resolved?