I have the following problem:
-
On my tower computer, mounting a nVidia GTX690, using CUDA 8.0 with VS 2015 I wrote a project (fully working :) )
-
I tried to import that on my (very old) laptop with the following configuration:
- nVidia GT330M, CUDA capability 1.2, compatible to CUDA 6.5
- Visual Studio 2013 Express
- nVidia Driver 341.74
As you can imagine, the project compiles just fine, but every time I try to execute a kernel (with whatever configuration of block/thread) and then synchronize the device returns the error code 30. This corresponds to an unknown internal error.
Is my GPU too old that it cannot run not even a simple kernel (e.g. a single call to cudaMalloc, for 16Kb buffer)?
Or is that a mismatch between the maximum nVidia driver version and the CUDA version?
If it can help, here’s the output of nvidia-smi:
+------------------------------------------------------+
| NVIDIA-SMI 341.74 Driver Version: 341.74 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GT 330M WDDM | 0000:01:00.0 N/A | N/A |
| N/A 52C P12 N/A / N/A | 972MiB / 979MiB | N/A Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Compute processes: GPU Memory |
| GPU PID Process name Usage |
|=============================================================================|
| 0 Not Supported |
+-----------------------------------------------------------------------------+
When I try to get the cudaProperties that’s a simplified version of what I get:
MapSMtoCores for SM 1.2 is undefined. Default to use 128 Cores/SM
Name : GeForce GT 330M
Total Global Memory : 1073741824
Total Constant Memory : 65536
Multi-Processor Count : 6
Compute Mode : 0
Concurrent Kernels : 0
Shared Memory Per Block: 16384
Registers Per Block : 16384
Max Threads Per Block : 512
Max Threads Dims : (512 | 512 | 64)
Max Grid Size : (65535 | 65535 | 1)
Is the compute_mode equal to 0 worrying? Or as well a 0 concurrent kernels?
Finally, it probably doesn’t make any difference, but both systems (tower and laptop) run on Windows 10 64 bits, and the code is compiled in Visual Studio only for 64 bits.