32-bit CUDA version is insufficient for CUDART version

I’m using CUDA 3.0 in 64-bit linux. I have installed driver version 195.36.15, and the latest toolkit, which has libcudart.so.3.0.14. This is a new install, so this is the only version on the computer. When I compile a 64-bit version of the program, it works correctly. However, when I add -m32 to the compiler (after cleaning, of course) to produce a 32-bit version, when run it gives the error “CUDA version is insufficient for CUDART version” when run. ldd confirms that it is pulling libcudart from the correct location:
libcudart.so.3 => /usr/local/cuda/lib/libcudart.so.3 (0xf7ef7000)

Is this a known problem when using the 32-bit runtime library with the 64-bit driver? Any work-arounds?

Edit: I discovered the Linux 64-bit version 195.36.24 driver on the drivers page and upgraded to it. No change.

Are you aware that cudart dll is versioned now too?

I took the deviceQuery source, removed all references to the shr* commands, replacing them with standard commands, and recompiled as such:

/usr/local/cuda/bin/nvcc -m64 -I/home/cluster/CUDA/3.0/sdk/sdk/C/common/inc -I/usr/local/cuda/include -o deviceQuery deviceQuery.cpp -L/usr/local/cuda/lib64 -lcudart

/usr/local/cuda/bin/nvcc -m32 -I/home/cluster/CUDA/3.0/sdk/sdk/C/common/inc -I/usr/local/cuda/include -o deviceQuery32 deviceQuery.cpp -L/usr/local/cuda/lib -lcudart

Here are the outputs:

./deviceQuery Starting…

CUDA Device Query (Runtime API) version (CUDART static linking)

There is 1 device supporting CUDA

Device 0: “GeForce GTX 480”

CUDA Driver Version: 3.0

CUDA Runtime Version: 3.0

CUDA Capability Major revision number: 2

CUDA Capability Minor revision number: 0

Total amount of global memory: 1609760768 bytes

Number of multiprocessors: 15

Total amount of constant memory: 65536 bytes

Total amount of shared memory per block: 49152 bytes

Total number of registers available per block: 32768

Warp size: 32

Maximum number of threads per block: 1024

Maximum sizes of each dimension of a block: 1024 x 1024 x 64

Maximum sizes of each dimension of a grid: 65535 x 65535 x 1

Maximum memory pitch: 2147483647 bytes

Texture alignment: 512 bytes

Clock rate: 1.40 GHz

Concurrent copy and execution: Yes

Run time limit on kernels: No

Integrated: No

Support host page-locked memory mapping: Yes

Compute mode: Default (multiple host threads can use this device simultaneously)

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 4200131, CUDA Runtime Version = 3.0, NumDevs = 1, Device = GeForce GTX 480

PASSED

./deviceQuery32 Starting…

CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount FAILED CUDA Driver and Runtime version may be mismatched.

FAILED

I also tried using the 32-bit toolkit explicitly, with the same result. The modified deviceQuery source is attached.

I also compiled and ran this on another computer with a Tesla S1070 using CUDA 2.3 and driver 195.36.15. Here’s the output

64-bit version:

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 4200171, CUDA Runtime Version = 2.30, NumDevs = 4, Device = Tesla T10 Processor, Device = Tesla T10 Processor

32-bit version:

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 134518067, CUDA Runtime Version = 2.30, NumDevs = 4, Device = Tesla T10 Processor, Device = Tesla T10 Processor

Should the reported driver versions be different?
deviceQuery.cpp (6.78 KB)

There’s a typo in the original source in the sprintf statement for the driver version. Once that’s fixed, and looking only at the versions,

64-bit:

CUDA Driver Version = 3.0, CUDA Runtime Version = 3.0

32-bit:

CUDA Driver Version = 0.0, CUDA Runtime Version = 3.0

And that looks like the problem. When using 32-bit code with the 64-bit driver, cudaDriverGetVersion() returns version 0.0.

In windows, you have to put cudart32_(version number).dll with your programm, cause in 3.0 cudart.dll is versioned. Could it be an issue?

No, libcudart.so.3 links to the correct version.

This is fixed in CUDA 3.1 beta. Thanks!