I have result click-to-photon for my raspberry-pi thin platform as inspired from #GRIDDays (see blogs https://gridforums.nvidia.com/default/topic/734/general-discussion/-griddays/).
Attention: The presented solution IS NOT Citrix HDX Ready Pi, ThinOS, ThinLinX, TLXOS, RPiTC, VTware, NoMachine, BerryTerminal, RDP, VNC, PARSEC … This is raw h264 streaming rtp/udp protocol and without Xorg server (eg. direct OpenMAX) and usbredir tcp/ip protocol, see below.
My best results click-to-photon is 60ms (95 ms with aero composer enabled).
VDI runs 1280x1024@30 eg. every 33ms encoder receive frame (7), HDMI runs 1280x1024@60 eg. vsynced every 16ms HDMI send frame (13). The worst double “vsync” wait time is 49ms (=33+16) (eg. the results 60-109ms are expected and measured). I used “HP LP1965” monitor with measured inputlag about 5ms (http://www.svethardware.cz/recenze-hp-lp1965-legenda-pokracuje/15859-3). The monitor has embedded high powered USB hub that also powering thin client (raspberry-pi and USB periferials, raspberry-pi is attached to monitor stand) and audio speakers.
(1) I soldered wires directly to left mouse button and attached to my oscilloscope.
(2-4) The usbredir protocol is opensource. It transparently connect any device to remote VM (qemu emulates uhci/ehci usb2).
(5) I used powerpoint black/white color button to change slide background.
(6) There is NO additional VDI agent software installed to DomU (except vGPU driver). I am using/testing not only windows but Linux too (Centos, Debian and SteamOS) with K1/K2 backend. There is no additional load in DomU on CPU or GPU (including NVenc) from VDI agent software ! There are no DomU software collisions (see some crashing stories) or no DomU dependencies/driver incompatibilities (see NVIFR/NVFBC problems).
(7) The rendered framebuffer is included in pci memory region shared with Dom0. It is available not only to accelerated from k###q but also to emulated VGA that eases installation and maintenance of any OS (including windows recovery states). There are many unresolved (>1year) NVidia bugs (like DX11 fullscreen, out-of-order frame delivery, mouse pointer …).
(8-10) The H264 is encoded on Maxwell Gen1 (K2200) that is 3x faster than Kepler and K2200 is optimal price/performance (5x 1280x1024@40 generated ~33% NVENC load ("nvidia-smi pmon")). There will not be need of separated encoder domain if NVidia disclosures API for direct access to NVenc without CUDA (CUDA does not work in Dom0 for more than 5years). "stream" multithreaded executable is about 28kB on disk size (VmRSS 25MB + GPU 25MB, and 240MB GPU memory per encoded h264 stream) and no other processes are needed. Inter domain sharing is based on xenstore.h,xen/evtchn.h,xen/gntalloc.h,xen/gntdev.h.
(11) The standard RTP/UDP protocol is used to transport h264 video stream (and also audio PCM stream in separate channel) to minimize software overhead. The HTB linux queuing discipline is used to shape traffic to raspbery-pi (it has only 100Mbit ethernet input) on separated secured VLAN. I also tested openVPN tunnel to secure VDI channel (not included in benchmark).
(12-13) The rasperry-pi OpenMAX IL modules (with accelerated h264) are used to decode and output video (and audio if supported) to HDMI. The peak load on raspberry-pi 2 is under 30%. "thin" multithreaded executable is about 110kB on disk size (VmSize 165kB) and no other processes are needed (eg. no Xorg, no window manager … only ssh for remote monitoring).
(14) The photon is emitted finally to phototransistor and to my oscilloscope.