GPUmat not working with K20c GPU

personhuang · April 22, 2013, 5:54am

Hello, I have a k20c GPU and want to run GPUmat under MATLAB. The GPUmat released version is without 3.5 compute capability. I contacted people at GPUmat. They have me recompile the source codes with 3.5 capability. However, currently, after success compilation, I still couldn’t load the K20c. It shows “Unable to load the kernels in file \cudalib35.cubin.” Can anyone help me to solve this issue?
Thank you very much!

Here is the nvsmi info:

Attached GPUs : 2
GPU 0000:42:00.0
Product Name : Tesla K20c
Display Mode : Disabled
Persistence Mode : N/A
Driver Model
Current : TCC
Pending : TCC
Serial Number : 0325112010352
GPU UUID : GPU-874bab7e-f57a-de67-93f3-6834cb832110
VBIOS Version : 80.10.14.00.02
Inforom Version
Image Version : 2081.0204.00.07
OEM Object : 1.1
ECC Object : 3.0
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
PCI
Bus : 0x42
Device : 0x00
Domain : 0x0000
Device Id : 0x102210DE
Bus Id : 0000:42:00.0
Sub System Id : 0x098210DE
GPU Link Info
PCIe Generation
Max : 2
Current : 2
Link Width
Max : 16x
Current : 16x
Fan Speed : 30 %
Performance State : P0
Clocks Throttle Reasons
Idle : Not Active
User Defined Clocks : Active
SW Power Cap : Not Active
HW Slowdown : Not Active
Unknown : Not Active
Memory Usage
Total : 4799 MB
Used : 78 MB
Free : 4721 MB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Ecc Mode
Current : Enabled
Pending : Enabled
ECC Errors
Volatile
Single Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Total : 0
Double Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Total : 0
Aggregate
Single Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Total : 0
Double Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Total : 0
Temperature
Gpu : 35 C
Power Readings
Power Management : Supported
Power Draw : 47.26 W
Power Limit : 225.00 W
Default Power Limit : 225.00 W
Min Power Limit : 150.00 W
Max Power Limit : 225.00 W
Clocks
Graphics : 705 MHz
SM : 705 MHz
Memory : 2600 MHz
Applications Clocks
Graphics : 705 MHz
Memory : 2600 MHz
Max Clocks
Graphics : 758 MHz
SM : 758 MHz
Memory : 2600 MHz
Compute Processes
Process ID : 2276
Name : C:\Program Files\MATLAB\R2013a\bin\win64\MATLAB.exe
Used GPU Memory : 63 MB

GPUmat info:

Starting GPU

GPUmat version: 0.280
Required CUDA version: 5.0
There are 2 devices supporting CUDA
CUDA Driver Version: 5.0
CUDA Runtime Version: 5.0

Device 0: “Tesla K20c”
CUDA Capability Major revision number: 3
CUDA Capability Minor revision number: 5
Total amount of global memory: 738000896 bytes

Device 1: “Quadro 600”
CUDA Capability Major revision number: 2
CUDA Capability Minor revision number: 1
Total amount of global memory: 1073741824 bytes

Your system has multiple GPUs installed
→ Please specify the GPU device number to use [0-1]: 0
Error using GPUmanagerCreate
Unable to recognize the GPU CUDA capability
Unable to load the kernels in file C:\Users\t-phuang\Documents\MATLAB\GPUmatcompile\GPUmat\GPUmat\release\win64\cuda\cudalib35.cubin. Running system diagnostics.
*** GPUmat system diagnostics

Running on → “win64”
Matlab ver. → “8.1.0.604 (R2013a)”
GPUmat version → 0.280
GPUmat build → 20-Apr-2013
GPUmat architecture → “win64”

*** ARCHITECTURE TEST
*** GPUmat architecture test → passed.

*** CUDA TEST
*** CUDA CUBLAS → installed (C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\bin\cublas64_.dll).
*** CUDA CUFFT → installed (C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\bin\cufft64_.dll).
*** CUDA CUDART → installed (C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\bin\cudart64_*.dll).

*** GPUmat device check
There are 2 devices supporting CUDA
CUDA Driver Version: 5.0
CUDA Runtime Version: 5.0

Device 0: “Tesla K20c”
CUDA Capability Major revision number: 3
CUDA Capability Minor revision number: 5
Total amount of global memory: 738000896 bytes
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 2147483647 x 65535 x 65535
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Clock rate: 0.71 GHz
Concurrent copy and execution: Yes
Run time limit on kernels: No
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Default (multiple host threads can use this device simultaneously)

Device 1: “Quadro 600”
CUDA Capability Major revision number: 2
CUDA Capability Minor revision number: 1
Total amount of global memory: 1073741824 bytes
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 65535
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Clock rate: 1.28 GHz
Concurrent copy and execution: Yes
Run time limit on kernels: Yes
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Default (multiple host threads can use this device simultaneously)

vacaloca · April 23, 2013, 10:15pm

Since GPUmat is seeing both your devices, I think this is an issue you’ll have to take with their support to fix/deal with.

Topic		Replies	Views
GPUmat issue CUDA Programming and Performance	1	2928	May 18, 2012
call to cuModuleLoadData returned error 209 Legacy PGI Compilers	4	4563	September 21, 2015
K20 with high utilization, but no compute processes. CUDA Setup and Installation	12	26689	March 19, 2015
Bandwidth Test fails using Cuda 9.1 and Visual Studio 2013 CUDA Setup and Installation	2	1130	February 21, 2018
cudaGetDeviceCount() Returns Wrong Count CUDA Setup and Installation	0	2286	September 8, 2014
I don't understand the execution time (k40c & GTX580). CUDA Programming and Performance	9	2459	April 23, 2015
Only K40c is being utilized for computation out of two GPUs. Other one is K5200. CUDA Setup and Installation	4	1074	October 19, 2015
nvcc error : 'ptxas' died due to signal 11 (Invalid memory reference) CUDA Programming and Performance	8	4893	March 12, 2014
cublasZgemm fails on FERMI but not on TESLA CUBLAS_STATUS_NOT_INITIALIZED even if 'cublasInit()& CUDA Programming and Performance	2	5906	February 17, 2011
Warning: Unified Memory Profiling is not supported on this configuration CUDA Programming and Performance	6	5085	May 28, 2015

GPUmat not working with K20c GPU

Related topics