GPUmat not working with K20c GPU

Hello, I have a k20c GPU and want to run GPUmat under MATLAB. The GPUmat released version is without 3.5 compute capability. I contacted people at GPUmat. They have me recompile the source codes with 3.5 capability. However, currently, after success compilation, I still couldn’t load the K20c. It shows “Unable to load the kernels in file \cudalib35.cubin.” Can anyone help me to solve this issue?
Thank you very much!

Here is the nvsmi info:

Attached GPUs : 2
GPU 0000:42:00.0
Product Name : Tesla K20c
Display Mode : Disabled
Persistence Mode : N/A
Driver Model
Current : TCC
Pending : TCC
Serial Number : 0325112010352
GPU UUID : GPU-874bab7e-f57a-de67-93f3-6834cb832110
VBIOS Version : 80.10.14.00.02
Inforom Version
Image Version : 2081.0204.00.07
OEM Object : 1.1
ECC Object : 3.0
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
PCI
Bus : 0x42
Device : 0x00
Domain : 0x0000
Device Id : 0x102210DE
Bus Id : 0000:42:00.0
Sub System Id : 0x098210DE
GPU Link Info
PCIe Generation
Max : 2
Current : 2
Link Width
Max : 16x
Current : 16x
Fan Speed : 30 %
Performance State : P0
Clocks Throttle Reasons
Idle : Not Active
User Defined Clocks : Active
SW Power Cap : Not Active
HW Slowdown : Not Active
Unknown : Not Active
Memory Usage
Total : 4799 MB
Used : 78 MB
Free : 4721 MB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Ecc Mode
Current : Enabled
Pending : Enabled
ECC Errors
Volatile
Single Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Total : 0
Double Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Total : 0
Aggregate
Single Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Total : 0
Double Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Total : 0
Temperature
Gpu : 35 C
Power Readings
Power Management : Supported
Power Draw : 47.26 W
Power Limit : 225.00 W
Default Power Limit : 225.00 W
Min Power Limit : 150.00 W
Max Power Limit : 225.00 W
Clocks
Graphics : 705 MHz
SM : 705 MHz
Memory : 2600 MHz
Applications Clocks
Graphics : 705 MHz
Memory : 2600 MHz
Max Clocks
Graphics : 758 MHz
SM : 758 MHz
Memory : 2600 MHz
Compute Processes
Process ID : 2276
Name : C:\Program Files\MATLAB\R2013a\bin\win64\MATLAB.exe
Used GPU Memory : 63 MB


GPUmat info:

Starting GPU

  • GPUmat version: 0.280
  • Required CUDA version: 5.0
    There are 2 devices supporting CUDA
    CUDA Driver Version: 5.0
    CUDA Runtime Version: 5.0

Device 0: “Tesla K20c”
CUDA Capability Major revision number: 3
CUDA Capability Minor revision number: 5
Total amount of global memory: 738000896 bytes

Device 1: “Quadro 600”
CUDA Capability Major revision number: 2
CUDA Capability Minor revision number: 1
Total amount of global memory: 1073741824 bytes

  • Your system has multiple GPUs installed
    → Please specify the GPU device number to use [0-1]: 0
    Error using GPUmanagerCreate
    Unable to recognize the GPU CUDA capability
    Unable to load the kernels in file C:\Users\t-phuang\Documents\MATLAB\GPUmatcompile\GPUmat\GPUmat\release\win64\cuda\cudalib35.cubin. Running system diagnostics.
    *** GPUmat system diagnostics
  • Running on → “win64”
  • Matlab ver. → “8.1.0.604 (R2013a)”
  • GPUmat version → 0.280
  • GPUmat build → 20-Apr-2013
  • GPUmat architecture → “win64”

*** ARCHITECTURE TEST
*** GPUmat architecture test → passed.

*** CUDA TEST
*** CUDA CUBLAS → installed (C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\bin\cublas64_.dll).
*** CUDA CUFFT → installed (C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\bin\cufft64_
.dll).
*** CUDA CUDART → installed (C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\bin\cudart64_*.dll).

*** GPUmat device check
There are 2 devices supporting CUDA
CUDA Driver Version: 5.0
CUDA Runtime Version: 5.0

Device 0: “Tesla K20c”
CUDA Capability Major revision number: 3
CUDA Capability Minor revision number: 5
Total amount of global memory: 738000896 bytes
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 2147483647 x 65535 x 65535
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Clock rate: 0.71 GHz
Concurrent copy and execution: Yes
Run time limit on kernels: No
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Default (multiple host threads can use this device simultaneously)

Device 1: “Quadro 600”
CUDA Capability Major revision number: 2
CUDA Capability Minor revision number: 1
Total amount of global memory: 1073741824 bytes
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 65535
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Clock rate: 1.28 GHz
Concurrent copy and execution: Yes
Run time limit on kernels: Yes
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Default (multiple host threads can use this device simultaneously)

Since GPUmat is seeing both your devices, I think this is an issue you’ll have to take with their support to fix/deal with.