CUDA on K2 passthrough

We have the following setup

  • XenServer 7.1 on Dell R720
  • XenDesktop 7.7
  • Grid K2 card configured in passthrough, Driver Version: 367.106
  • VM running Windows 10

VM is accessed through Citrix using HDX 3D Pro
We are trying to use the VDI for video editing using Premiere Pro and AfterEffects.
The GPU is recognized in Windows 10 and OpenCL and DirectX are working correctly.
For the use of Premiere Pro we need CUDA to force the rendering on the GPU instead of using the CPU to render videos.
Is this supported using the NVIDIA GRID K2 card?

Hi

Yes, this will work, Passthrough is your only recommended option with Kepler for CUDA use.

Regards

Ben

With the driver version i found on the Nvidia website CUDA is not available.
http://www.nvidia.co.uk/download/driverResults.aspx/119900/en-uk
Do i need to install or configure any other components to get this to work?

What are you using to validate whether CUDA is active or not?

I’m not aware of any driver changes to remove CUDA from Passthrough, it was available on earlier driver releases with earlier versions of XenServer.

I’m using GPU-Z to check the features of the GPU
In Premiere Pro and AfterEffects the card is not recognized and the rendering options are stuck on Software Only

GPU-z is not the best tool for that. There were previous reports of it not detecting CUDA despite it being active. Try using this: http://cuda-z.sourceforge.net/

Report from CUDA-Z
CUDA-Z Report

Version: 0.10.251 32 bit http://cuda-z.sf.net/
OS Version: Windows AMD64 6.2.9200
Driver Version: 370.12
Driver Dll Version: 8.0 (6.14.13.7012)
Runtime Dll Version: 6.50

Core Information

Name: GRID K2
Compute Capability: 3.0
Clock Rate: 745 MHz
PCI Location: 0:0:6
Multiprocessors: 8 (1536 Cores)
Threads Per Multiproc.: 2048
Warp Size: 32
Regs Per Block: 65536
Threads Per Block: 1024
Threads Dimensions: 1024 x 1024 x 64
Grid Dimensions: 2147483647 x 65535 x 65535
Watchdog Enabled: Yes
Integrated GPU: No
Concurrent Kernels: Yes
Compute Mode: Default
Stream Priorities: No

Memory Information

Total Global: 4096 MiB
Bus Width: 256 bits
Clock Rate: 2500 MHz
Error Correction: No
L2 Cache Size: 48 KiB
Shared Per Block: 48 KiB
Pitch: 2048 MiB
Total Constant: 64 KiB
Texture Alignment: 512 B
Texture 1D Size: 65536
Texture 2D Size: 65536 x 65536
Texture 3D Size: 4096 x 4096 x 4096
GPU Overlap: Yes
Map Host Memory: Yes
Unified Addressing: No
Async Engine: Yes, Bidirectional

Performance Information

Memory Copy
Host Pinned to Device: 10.1932 GiB/s
Host Pageable to Device: 4954.09 MiB/s
Device to Host Pinned: 9793.82 MiB/s
Device to Host Pageable: 4262.34 MiB/s
Device to Device: 56.0631 GiB/s
GPU Core Performance
Single-precision Float: 1870.37 Gflop/s
Double-precision Float: 95.2411 Gflop/s
64-bit Integer: 95.2396 Giop/s
32-bit Integer: 380.344 Giop/s
24-bit Integer: 380.291 Giop/s

Generated: Mon Sep 18 08:33:14 2017