Good progress so far, nice work getting it up and running :-)
Yes, within vCenter you need to change to “Shared Direct” for vGPU to be enabled, the other setting is for Passthrough which you aren’t using in this instance. This is a “Per Host” setting, not “Per Cluster”, so you’ll need to set it on all VX Nodes that have a GPU in.
Regarding vGPU Profiles, the number in the Profile equates to Framebuffer. 1 = 1GB / 2 = 2GB / 4 = 4GB / 8 = 8GB / 16 = 16GB. Other GPUs can go higher as they have more Framebuffer (24 / 32 / 40 / 48).
With vGPU, you hard allocate the framebuffer per VM, but share the GPU processing / encoding / decoding cycles between VMs. What this means is that if you want to run 4 VMs at the same time, the maximum Profile size you can use with a 16GB GPU is 4GB. In case you aren’t aware, detailed documentation is available from here: https://docs.nvidia.com/grid/ The vGPU Software User Guide lists all the available Profiles per GPU (as well as other information) and is worth having a read through: https://docs.nvidia.com/grid/10.0/grid-vgpu-user-guide/index.html
What I would do in this instance, is start off with a well spec’d VM. Give it 8 vCPUs, 16GB (System) RAM and a 16Q vGPU Profile (Quadro, not Compute) at this stage and run each of the applications independently (1 VM at a time as it’s a 16GB Profile) to build up an understanding of how the application uses the hardware and what kind of resources it needs. Using the “Q” Profile means that you’re enabling all of the GPUs features and performance and as you’ll only be testing 1 VM at a time, you can allocate all of the Framebuffer. Once the Application is working and you are happy with performance and have tailored the resources (CPU / RAM / GPU Framebuffer), you can then change the GPUs Profile to a “C” and run the same tests to see what the differences are. You can then decide which vGPU Profile (C or Q) is appropriate for each Application. Be aware though, I see that the website mentions the Application can make use of multiple GPUs, which means that it may be quite heavy, meaning you could end up requiring an entire T4 per VM purely for processing cycles. Your testing will give you the results and you can then decide how to proceed.
To help you with resource monitoring, you can use a tool called "GPU Profiler": https://github.com/JeremyMain/GPUProfiler/releases
Once your testing is complete, you should have a complete Resource Profile for each of the Applications. Each Application may require different amounts of resources depending on what it’s doing, so don’t assume they will all be the same. This includes CPU, RAM and GPU. GPU Profiler can help with monitoring all of those at the same time and will create a nice graph you can save and refer to later on. However, if the Application is also Multi-Threaded on the CPU, then just use (Windows) Task Manager to see what each Thread is doing and whether you need to scale up or down on the CPU side.
Don’t forget to optimize Windows as well: https://flings.vmware.com/vmware-os-optimization-tool
Let me know how you get on …
(FYI - I’m UK based, so there’s a bit of a time difference between us ;-) )