I have just reinstalled my GPU workstation (4*Tesla C1060) with CentOS 6.5 and then installed cuda 5 on it. But the CUDA does not work.
things I have done:
(1)install: in text mode,log in as root, run “sh cuda_5.0.35_linux_64_rhel6.x-1.run”, all three parts (driver, toolkit and sample) were installed correctly. The file is downloaded from https://developer.nvidia.com/cuda-toolkit-archive
(2)test deviceQuery: log in as root, run “make” under path “/usr/local/cuda-5.0/samples/1_Utilities/deviceQuery”, run “./deviceQuery”, then I got:
[b][u][i]./deviceQuery Starting…
CUDA Device Query (Runtime API) version (CUDART static linking)
cudaGetDeviceCount returned 10
→ invalid device ordinal[/i][/u][/b]
(3)test a program: no problem with compiling, run, then I got:
device(s) detected on this machine: 0
1.0000001.000000=61082585891995785255441237761970929664.000000
2.0000002.000000=0.000000
3.0000003.000000=0.000000
4.0000004.000000=0.000000
5.000000*5.000000=288207915396127595249952358400.000000
I really have no idea what to do with this :(
Could anyone give me some advice?
Thanks~
PS: program
#include <cuda_runtime.h>
#include <stdio.h>
#define N 5
__global__ void kernel(float *a, float *b)
{
int tid = blockIdx.x*blockDim.x + threadIdx.x;
if (tid<N)
b [tid] = a[tid]*a[tid];
}
int main ()
{
int i;
int ndevice=0;
cudaGetDeviceCount(&ndevice);
printf("device(s) detected on this machine: %d\n", ndevice);
float A[N], B[N], *a, *b;
cudaMalloc(&a, sizeof(float)*N);
cudaMalloc(&b, sizeof(float)*N);
for(i=0;i<N;i++) A[i]=i+1;
cudaMemcpy(a, A, sizeof(float)*N, cudaMemcpyHostToDevice);
kernel<<<1,N>>>(a,b);
cudaMemcpy(B, b, sizeof(float)*N, cudaMemcpyDeviceToHost);
for(i=0;i<N;i++) printf("%f*%f=%f\n", A[i], A[i], B[i]);
return 0;
}