After one day I manage to install cuda and sdk on Ubunt 9.04.
Everithing works fine, I can compile and execute all the sdk examples.
I tried to write and compile an example found in this forum, but the result isn’t as waited.
In the code below the result should be the square of the element of the vector, but changing or not the
row in which the calculation is done (the original is commented) in the kernel the result is always the same:
Apparently the cudamemcpy isn’t working.
I tried also another very simple example on matrix addition and the patology is exactly the same.
Do you have some suggestions?
// Kernel that executes on the CUDA device
global void square_array(float *a, int N)
int idx = blockIdx.x * blockDim.x + threadIdx.x;
//if (idx<N) a[idx] = a[idx] * a[idx];
if (idx<N) a[idx] = 2.;
// main routine that executes on the host
float *a_h, *a_d; // Pointer to host & device arrays
const int N = 10; // Number of elements in arrays
size_t size = N * sizeof(float);
a_h = (float *)malloc(size); // Allocate array on host
cudaMalloc((void **) &a_d, size); // Allocate array on device
// Initialize host array and copy it to CUDA device
for (int i=0; i<N; i++) a_h[i] = (float)i;
cudaMemcpy(a_d, a_h, size, cudaMemcpyHostToDevice);
// Do calculation on device:
int block_size = 4;
int n_blocks = N/block_size + (N%block_size == 0 ? 0:1);
square_array <<< n_blocks, block_size >>> (a_d, N);
// Retrieve result from device and store it in host array
cudaMemcpy(a_h, a_d, sizeof(float)*N, cudaMemcpyDeviceToHost);
// Print results
for (int i=0; i<N; i++) printf("%d %f\n", i, a_h[i]);