Hi,
Sorry if another guy has previously asked about this problem. All the solutions from other threads didn’t work for my case.
Here is the packages I use:
cudatoolkit_4.0.11_linux_64_rhel5.5.run
gpucomputingsdk_4.0.11_linux.run
NVIDIA-Linux-x86_64-270.18.run
All of them are successfully installed. The ‘make’ in SDK is also done with out any error message.
deviceQueryDrv works fine but deviceQuery didn’t, all other examples don’t work as well:
[fzhu@gpus release]$ ./deviceQueryDrv
CUDA Device Query (Driver API) statically linked version
There are 2 devices supporting CUDA
Device 0: “Tesla M2070”
CUDA Driver Version: 4.0
CUDA Capability Major/Minor version number: 2.0
Total amount of global memory: 5636554752 bytes
Multiprocessors x Cores/MP = Cores: 14 (MP) x 32 (Cores/MP) = 448 (Cores)
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 65535
Maximum memory pitch: 2147483647 bytes
Memory Bus Width: 384-bit
Memory Clock rate: 1566.00 Mhz
Texture alignment: 512 bytes
GPU Clock rate: 1.15 GHz
Concurrent copy and execution: Yes
of Asynchronous Copy Engines: 2
Run time limit on kernels: No
Integrated: No
Support host page-locked memory mapping: Yes
Concurrent kernel execution: Yes
Device has ECC support enabled: Yes
Device is using TCC driver mode: No
Device 1: “Tesla M2070”
CUDA Driver Version: 4.0
CUDA Capability Major/Minor version number: 2.0
Total amount of global memory: 5636554752 bytes
Multiprocessors x Cores/MP = Cores: 14 (MP) x 32 (Cores/MP) = 448 (Cores)
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 65535
Maximum memory pitch: 2147483647 bytes
Memory Bus Width: 384-bit
Memory Clock rate: 1566.00 Mhz
Texture alignment: 512 bytes
GPU Clock rate: 1.15 GHz
Concurrent copy and execution: Yes
of Asynchronous Copy Engines: 2
Run time limit on kernels: No
Integrated: No
Support host page-locked memory mapping: Yes
Concurrent kernel execution: Yes
Device has ECC support enabled: Yes
Device is using TCC driver mode: No
PASSED
[fzhu@gpus release]$ ./deviceQuery
./deviceQuery Starting…
CUDA Device Query (Runtime API) version (CUDART static linking)
cudaGetDeviceCount FAILED CUDA Driver and Runtime version may be mismatched.
FAILED
Press to Quit…
Some other information:
[fzhu@gpus ~]$ /sbin/lsmod | grep nvidia
nvidia 10676552 0
i2c_core 57537 3 nvidia,i2c_ec,i2c_i801
[fzhu@gpus ~]$ ls -l /dev/ | grep nvidia
crw-rw-rw- 1 root root 195, 0 Mar 14 18:28 nvidia0
crw-rw-rw- 1 root root 195, 1 Mar 14 18:28 nvidia1
crw-rw-rw- 1 root root 195, 255 Mar 14 18:28 nvidiactl
Any help welcome …
Fan