Autodesk Revit and User density

Hi…
New to the forums and very pleased to see a wealth of information and knowledge to hand.

I have recently taken over the IT for a company who plans to use Autodesk Revit and Adobe CC as its primary business tools. Currently the desktop infrastructure is not sufficient and this now leads to an interesting choice of either VDI or a Physical Desktop Infrastructure ?

I have before run a small scale trial using XenApp with a single shared desktop and a medium spec nVidia quadro card so i know in principle that the tech works.

How do i get the best performance out of a single host ?
XenServer hyper visor onto bare metal.
XenApp VM
Host VM for shared desktop Windows 2012 with GPU Pass through.

what happens when i max out performance of that single host ?
is XenApp able to load balance the same shared desktop across different hosts including local profiles of users ?

Also can i find out who is the Solutions Expert for the UK please.

Cheers
Colin.

Also just to ask.
i was at the IP Expo last week and there was a lot of talk about HyperConvergence.
Nutanix Simplivity and so forth…

while it looks very nice for my server farm which dosnt require any GPU acceleration.
how does it fit in with something like VDI with GPU acceleration ?

XenServer

Due to changes in the OS, the experimentat support for multi gpu for XenApp is no longer a valid option and Citrix are not continuing to develop it.

I woudl also suggest that depending on your workloads XenDesktop may be a better choice.

Me.

Hi Colin,

A bare metal XenApp server with a high end graphics card in GPU passthrough mode will give you the best performance possible for those applications at a reasonable cost, with no hypervisor penalties involved.

Load balancing between servers is configured by a Citrix policy, profiles would be streamed from a shared fileshare and the setup is easy to manage and maintain as you know from your previous trials.

There are many benefits of virtualizing the hosts though that may compensate for the hit on performance, as well as different pros and cons for VDIs, Session Hosts, SaaS solutions or even "remote-enabled" physical endpoints.

Good luck!

Hi Chaps

Cheers for the feedback.
i suppose there is little choice but to get stuck in and see what gives.

Tony just for clarity, are you suggesting the XenAPP hosts are best directly installed on bare metal ?
As in install 2012 and then prep to be a shared server OS environment ?

I suppose if i can VM the XenAPP Server and the share with the profiles (so they are included in my backup and DR regime) then the XenAPP hosts can be considered vanilla installs which if they break could easily be re imagined using something like FOG.

I have a bunch of older HP workstations for which i can use for a POC for the streaming of profiles etc.

it also looks like i may be able to free up a Dell 720 for a short term test for user densities.

interesting…

it occurs to me that the use of a Grid GPU isn’t going to help in this instance ?
and something like the M6000 would be better suited ?

Hi,

That’s good to hear!

I wouldn’t want to make any recommendations as I know very little of your business, users and other circumstances. But yes, a normal Windows Server installation bare metal with a high end GPU and a XenApp VDA on top can give a very high performance and user density.

There are some things to look out for still of course. R720 is pretty straight forward, HP Gen9 servers require disabling the embedded video card. Revit licenses need to be networking types. ISV support for Server OS for other applications. Remote management utilities. Performance tuning in BIOS, OS and of the profile functionality. Using 3+ Ghz CPUs (do not neglect this part, especially not for Revit). And so on, I wouldn’t be able to fit it all in a forum post but that’s the kind of assistance partners like the one I work for and others such like are providing. There are also very good POC guides that NVIDIA have made available for vGPU on both vSphere and XenServer, as well as by Citrix for GPU Passthrough, including for XenApp deployment.

Personally I am very interested in the GRID 2.0 M6 for a group of bare metal XenApp blades, as soon as the licensing details for GRID 2.0 are fully clarified. Neither of K1 or K2 fit that scenario very well, the K6000 or M6000 would be a better option running bare metal rack servers as you point out. Note that dual K2s with a hypervisor added for 4 XenApps per host is also a highly solid configuration for the applications you mention and not overly complex to maintain. The performance is not as good but it can be more flexible and reliable in other ways. It really comes down to what the business requirements and priorities are to you (in terms of size, cost, reliability, flexibility and performance).

The biggest concern with something like a XenApp installation is going to be what graphics protocols are supported (DX9, DX11, openGL, RemoteFX, etc.) by the implementation and what the apps need, and so your platform of choice should take carefully into account what your users need to run. Note that Windows Server 2016 will offer a plethora of more options once it comes out than currently available on Windows Server 2012 R2. XenApp on XenServer – I can speak to directly from experience – works very well and scales nicely with a K2 in GPU passthrough mode on a Dell R720. Revit runs quite well with the K2 in passthrough mode, even remotely, so I am not sure what issues you are up against, Tony – maybe very complex images? I will note that our student generally don’t work on projects that are typically as complex as in industry, so that may be part of the difference. I should add the reason for running under XenServer is that some of the resources are also used for hosting XenDesktop VDIs in additional to XenApp. Running XenAPp with openGL support on bare metal has been supported I believe since XenApp 6.5 that openGL is supported on bare metal installations.

Alas, in many cases, the best option is to set up a test environment and use your direct experiences with it to help plan the scale-out for your expanded user base.

Hi Tobias,

XenApp supports OpenGL and DirectX fully regardless if Windows Server is running virtualized with a passthrough GPU or bare metal. Have you seen any notifications of anything else? Would be interesting to know. CUDA and OpenCL sharing are limited to experimental (even though it seems to be stable as far as I’ve seen) with XenApp but not applicable to the applications mentioned now anyway. Neither is it available at all in vGPU with K1/K2 or in low/mid vGPU types with M6/M60 (so far).

Windows Server 2016 adds support for OpenGL in RemoteFX, without using XenApp (HDX). So native Microsoft RDS. It’s still some distance away from being a vetted alternative for these applications (looking at all the other capabilities HDX offer beside OpenGL support) as I see it, but looking forward to trying it out once released.

Citrix offer provisioning to physical machines using PVS (can bit a bit complex to deal with though) or there are a variety of other tools out there for more or less efficient bare metal deployment. But yes I agree, remote management, image management, backup and recovery are things to take under consideration that are made easier in a virtualized platform. If each server support 25 concurrent users (reasonable assumption for those apps on a good server) and the requirement is 100 users though it wouldn’t be very complex… as always it depends on what the requirements and expectations are.

Nutanix and Pivot3 both ship GPU enabled nodes, and are fully integrated. I work with both vendors.

The others may not ship GPU nodes specifically but pretty much all of them can work with GPU enabled servers. It’s just another workload, as long as the resources are available it will run and manage them.

[quote="tjkreidl
Alas, in many cases, the best option is to set up a test environment and use your direct experiences with it to help plan the scale-out for your expanded user base.[/quote]

Absolutely true.

I’m a huge fan of XenApp & RDSH having been around since the WinFrame days and never quite escaping. One of the challenges though is that there’s no control over the FrameBuffer used by each session.

One example I’ve seen recently is where an application that works wonderfully with a 1GB vGPU profile in XenDesktop, doesn’t scale linearly in XenApp. The reason for this is that in XenDesktop the application is constrained to the 1GB of graphics RAM, and so manages its utilisation accordingly. When in a XenApp environment, it see’s considerable more (close to 1.6GB) and so consumes more to gain some performance advantage.

The second user comes along and also takes 1.6GB, which is fine, when the 3rd user arrives, they to take 1.6GB, then the 4th user starts a session, they need at least 1GB of framebuffer for a good experience, but the other sessions don’t release their usage, the keep it, and user number 4 get’s just 200MB of Graphics Memory and a poor experience.

To make matters worse, the excess paging of the data that should be in graphics memory to system memory has an impact elsewhere affecting all the other sessions.

Even with a K6000 (12GB of Memory) we could only get 6 sessions running on the host due to the memory challenges, and using K2’s with XenDesktop and vGPU allowed us to hit 16 due to the better memory management.

In this case XenDesktop and vGPU gave much better density than XenApp because of the ability to control the Graphics memory and give consistent performance across the users.

So, you really do need to test, and test under load with the models and applications the users will be working with. As Tobias has stated, he gets very good loads from a server with students that aren’t working on anything complex, but increasing the complexity and the size of the models will lead to an increase in resource requirements.

Colin, I replied to your email to my colleague Mike Wang last night, so you’ve got my direct contact details and we can discuss this in more details if you’d like.

Jason, what application did you notice this with (ESRi ArcGIS perhaps)? Most Autodesk and Adobe products have been scaling wonderfully in XenApp to my experience and in this case it was about Revit and Adobe CC (guessing Photoshop/InDesign/Acrobat). Have you had any issues like that with either of those?

It’s very dependent on the amount of data that’s being placed in the frame buffer. AutoCAD and Revit are rather frugal, which makes them good XenApp candidates, but there are applications around which we have to be wary with. Adobe Illustrator for example expects 2GB per user, so in theory a couple of sessions and thats the GPU loaded, but if the utilisation isn’t there maybe XenApp is again a good choice…

Certainly it goes back to the old ways of just because one or two users work well, it doesn’t mean that 10 or more will and we should always test at scale. Being aware of just how much graphics memory is required for each session helps to determine what to expect.

Jason,
A bit of clarification, please, about your statement above regarding memory allocation:

The second user comes along and also takes 1.6GB, which is fine, when the 3rd user arrives,
they to take 1.6GB, then the 4th user starts a session, they need at least 1GB of framebuffer
for a good experience, but the other sessions don’t release their usage, the keep it, and user
number 4 get’s just 200MB of Graphics Memory and a poor experience.

In GPU passthrough mode, my understanding is that this works sort of like dynamic memory allocation (memory ballooning) in a VM. The first user gets as much memory as is needed, and so on with subsequent users until there is no more memory available. Does then the memory allocation shrink for each user or does it become a question of multiplexing users so that they end up effectively being swapped in and out? I am under the impression it’s the former. Just trying to get a better grip on the model of how this works.

The second question is how much overhead is there really with a XenApp running as a VM supporting a GPU vs. a bare meta installation? Clearly, there is a bit more overhead compared to a vGPU, since for GPU passthrough, the work has to be done by dom0 (hence, a good idea to run XenApp instances on servers independently of any other heavy-duty VMs). But for powerful servers, I at least do not get the impression that even for a VM vs. bare metal installation that this is a huge performance hit. It would be interesting to see load comparisons of, say, 16 users with similar applications connecting via GPU passthrouh vs. 16 vGPU-based sessions.

From experience it’s the former, until the applications release their memory allocation they retain it. So swapping only occurs with the subsequent users that arrive once the memory is consumed.

It very much depends on how the applications manage their graphics memory, and whether they’re dynamic in releasing unused memory. Some are rather “greedy” and will grab as much as they can and hold onto it even when it’s not required.

Overhead is minimal, though you do have the additional load of running 16 OS instances vs 4 if using passthrough, or 1 in bare metal.
However with the decision by Citrix to cease further development on experimental support of Multi GPU in XenApp I can’t recommend considering bare metal (unless using a Single Quadro GPU)

http://support.citrix.com/article/CTX202148

Thank you, Jason, that was helpful. As to the swapping process, what happens to GPU memory that gets swapped out – where is it swapped to, and how efficient is that process?
Regarding the multiple GPUs per XenAPp, yes, I am aware that this will not be supported, which is a shame, hence it means the best bet is to probably run multiple XenApp instances under XenServer on a physical host to keep from needing too many physical servers.

Best regards,

System Memory, but if there’s insufficient, then in will page. How efficient it is depends on a number of factors. Ideally the traffic will not traverse QPI, so it’s predominantly down to how efficient is the PCIe/CPU/RAM on the mainboard. If it crosses QPI (less likely since the hypervisors have become better at handling VM placement with regards to NUMA) then the overhead will be greater.

Of course if it starts to page, then we introduce a host of other issues…

It is indeed a shame, it had great potential, but looks like the engineering required to keep it going was prohibitive.