CUDA 6.5: Device Synchronize returns 30

I have the following problem:

  • On my tower computer, mounting a nVidia GTX690, using CUDA 8.0 with VS 2015 I wrote a project (fully working :) )

  • I tried to import that on my (very old) laptop with the following configuration:

    • nVidia GT330M, CUDA capability 1.2, compatible to CUDA 6.5
    • Visual Studio 2013 Express
    • nVidia Driver 341.74

As you can imagine, the project compiles just fine, but every time I try to execute a kernel (with whatever configuration of block/thread) and then synchronize the device returns the error code 30. This corresponds to an unknown internal error.

Is my GPU too old that it cannot run not even a simple kernel (e.g. a single call to cudaMalloc, for 16Kb buffer)?

Or is that a mismatch between the maximum nVidia driver version and the CUDA version?

If it can help, here’s the output of nvidia-smi:

| NVIDIA-SMI 341.74     Driver Version: 341.74         |
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  GeForce GT 330M    WDDM  | 0000:01:00.0     N/A |                  N/A |
| N/A   52C   P12    N/A /  N/A |    972MiB /   979MiB |     N/A      Default |

| Compute processes:                                               GPU Memory |
|  GPU       PID  Process name                                     Usage      |
|    0            Not Supported                                               |

When I try to get the cudaProperties that’s a simplified version of what I get:

MapSMtoCores for SM 1.2 is undefined.  Default to use 128 Cores/SM
Name                   : GeForce GT 330M
Total Global Memory    : 1073741824
Total Constant Memory  : 65536
Multi-Processor Count  : 6
Compute Mode           : 0
Concurrent Kernels     : 0
Shared Memory Per Block: 16384
Registers   Per Block  : 16384
Max Threads Per Block  : 512
Max Threads Dims       : (512 | 512 | 64)
Max Grid Size          : (65535 | 65535 | 1)

Is the compute_mode equal to 0 worrying? Or as well a 0 concurrent kernels?

Finally, it probably doesn’t make any difference, but both systems (tower and laptop) run on Windows 10 64 bits, and the code is compiled in Visual Studio only for 64 bits.

suggestion: don’t provide a “simplified version”. Provide the full output from deviceQuery. You cut off important information, like the indicated CUDA driver version and CUDA runtime version.

If, when you install CUDA 6.5 on the old laptop, and you run CUDA sample codes like deviceQuery, and vectorAdd, you don’t get the correct output, then there is no point considering why your own code is not working - the CUDA install on that machine is broken.

Thanks for your help.

At the end I managed to do it:

Reinstall CUDA 6.5 (CUDA samples didn’t work)

Manually download CUDA Samples

Copy configuration setup and run in 32 bit, as 64 bit (I’m surprised samples didn’t work on 64 bits though)

Remove any reference to double precision routines, as these are not supported

Substitute size_t with unsigned int when dealing with pointers

Remove any reference to cudaMallocHost as apparently pinned host memory wasn’t supported before 1.3