Hi guys,
I am experiencing strange behaviour when trying to lunch code on Tesla C1060 and Quadro FX3700. Basically, no matter what kernel I am trying to lunch the application returns the same data that was initialised on host. In other words, the data was not affected by the kernel at all.
I am running Ubuntu 9.4 but tested on 10.04 with the same result. I also tried using the latest drivers as well as the previous release but no luck. All my code executes fine in our laboratory where I have C1060s and Ubuntu 9.04 too.
Could thid be a hardware issue of my PC? Perhaps something that could be modified in BIOS?
As a simple example:
[codebox]#include "stdafx.h"
#include <stdio.h>
#include <cuda.h>
// Kernel that executes on the CUDA device
__global__ void square_array(float *a, int N)
{
int idx = blockIdx.x * blockDim.x + threadIdx.x;
if (idx<N) a[idx] = a[idx] * a[idx];
}
// main routine that executes on the host
int main(void)
{
float *a_h, *a_d; // Pointer to host & device arrays
const int N = 10; // Number of elements in arrays
size_t size = N * sizeof(float);
a_h = (float *)malloc(size); // Allocate array on host
cudaMalloc((void **) &a_d, size); // Allocate array on device
// Initialize host array and copy it to CUDA device
for (int i=0; i<N; i++) a_h[i] = (float)i;
cudaMemcpy(a_d, a_h, size, cudaMemcpyHostToDevice);
// Do calculation on device:
int block_size = 4;
int n_blocks = N/block_size + (N%block_size == 0 ? 0:1);
square_array <<< n_blocks, block_size >>> (a_d, N);
// Retrieve result from device and store it in host array
cudaMemcpy(a_h, a_d, sizeof(float)*N, cudaMemcpyDeviceToHost);
// Print results
for (int i=0; i<N; i++) printf("%d %f\n", i, a_h[i]);
// Cleanup
free(a_h); cudaFree(a_d);
}[/codebox]
and the result is:
[codebox]0 0.000000
1 1.000000
2 2.000000
3 3.000000
4 4.000000
5 5.000000
6 6.000000
7 7.000000
8 8.000000
9 9.000000
[/codebox]
I checked for the errors, however, none were returned. I also tried to synchronise all threads after kernel lunch but that made no difference.
This is just one example that does not return correct values. Strangely enough the compilation of SDK went without problems and I can run most of the demos, however, some like deviceQuery and few others would not show anything. In addition, particles, mandelbrot and few others would freeze my system.
Could you please point me to the right direction because at the moment I am stuck :D
Thank you,
Martin Peniak
[post=“0”]www.martinpeniak.com[/post]