Problems when installing CUDA 4.0

Hi,

Sorry if another guy has previously asked about this problem. All the solutions from other threads didn’t work for my case.

Here is the packages I use:

cudatoolkit_4.0.11_linux_64_rhel5.5.run

gpucomputingsdk_4.0.11_linux.run

NVIDIA-Linux-x86_64-270.18.run

All of them are successfully installed. The ‘make’ in SDK is also done with out any error message.

deviceQueryDrv works fine but deviceQuery didn’t, all other examples don’t work as well:

[fzhu@gpus release]$ ./deviceQueryDrv

CUDA Device Query (Driver API) statically linked version

There are 2 devices supporting CUDA

Device 0: “Tesla M2070”

CUDA Driver Version: 4.0

CUDA Capability Major/Minor version number: 2.0

Total amount of global memory: 5636554752 bytes

Multiprocessors x Cores/MP = Cores: 14 (MP) x 32 (Cores/MP) = 448 (Cores)

Total amount of constant memory: 65536 bytes

Total amount of shared memory per block: 49152 bytes

Total number of registers available per block: 32768

Warp size: 32

Maximum number of threads per block: 1024

Maximum sizes of each dimension of a block: 1024 x 1024 x 64

Maximum sizes of each dimension of a grid: 65535 x 65535 x 65535

Maximum memory pitch: 2147483647 bytes

Memory Bus Width: 384-bit

Memory Clock rate: 1566.00 Mhz

Texture alignment: 512 bytes

GPU Clock rate: 1.15 GHz

Concurrent copy and execution: Yes

of Asynchronous Copy Engines: 2

Run time limit on kernels: No

Integrated: No

Support host page-locked memory mapping: Yes

Concurrent kernel execution: Yes

Device has ECC support enabled: Yes

Device is using TCC driver mode: No

Device 1: “Tesla M2070”

CUDA Driver Version: 4.0

CUDA Capability Major/Minor version number: 2.0

Total amount of global memory: 5636554752 bytes

Multiprocessors x Cores/MP = Cores: 14 (MP) x 32 (Cores/MP) = 448 (Cores)

Total amount of constant memory: 65536 bytes

Total amount of shared memory per block: 49152 bytes

Total number of registers available per block: 32768

Warp size: 32

Maximum number of threads per block: 1024

Maximum sizes of each dimension of a block: 1024 x 1024 x 64

Maximum sizes of each dimension of a grid: 65535 x 65535 x 65535

Maximum memory pitch: 2147483647 bytes

Memory Bus Width: 384-bit

Memory Clock rate: 1566.00 Mhz

Texture alignment: 512 bytes

GPU Clock rate: 1.15 GHz

Concurrent copy and execution: Yes

of Asynchronous Copy Engines: 2

Run time limit on kernels: No

Integrated: No

Support host page-locked memory mapping: Yes

Concurrent kernel execution: Yes

Device has ECC support enabled: Yes

Device is using TCC driver mode: No

PASSED

[fzhu@gpus release]$ ./deviceQuery

./deviceQuery Starting…

CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount FAILED CUDA Driver and Runtime version may be mismatched.

FAILED

Press to Quit…

Some other information:

[fzhu@gpus ~]$ /sbin/lsmod | grep nvidia

nvidia 10676552 0

i2c_core 57537 3 nvidia,i2c_ec,i2c_i801

[fzhu@gpus ~]$ ls -l /dev/ | grep nvidia

crw-rw-rw- 1 root root 195, 0 Mar 14 18:28 nvidia0

crw-rw-rw- 1 root root 195, 1 Mar 14 18:28 nvidia1

crw-rw-rw- 1 root root 195, 255 Mar 14 18:28 nvidiactl

Any help welcome …

Fan

The driver you have doesn’t support CUDA 4.0. You will need to use the developer driver available where you downloaded the 4.0rc toolkit and sdk.

In its release note.

‘CUDA 4.0 Release requires version 270 or newer version of the linux NVIDIA Display Driver.’

The interesting thing is on the CUDA 4.0 RC webpage, the only driver available is 256 series, which I have tried as well. But the problem still.

That isn’t correct. The “release” driver for the 4.0rc is 270.27. 270.18 does not have CUDA 4.0 support AFAIK.

The latest version I can found is 270.26 . Anyway I will try it first.

Thanks, Avidday.

Fan

It works perfectly.

Many Thanks!!