I have written a GPU application, which squares number 1-9 on the GPU. I have executed the CUDA program on two different machines; however, I get completely different results. I am using a Tesla C2075 device and a GeForce 9400M
Thanks,
Any help on this issue would be greatly appreciated.
Likely a precision issue. Since the 9400M doesn’t support double precision, perhaps you are just comparing its single-precision results with the Tesla’s double-precision results?
// Kernel that executes on the CUDA device global void square_array(float *a, int N)
{
int idx = blockIdx.x * blockDim.x + threadIdx.x;
if (idx<N) a[idx] = a[idx] * a[idx];
}
// main routine that executes on the host
int main(void)
{
float *a_h, *a_d; // Pointer to host & device arrays
const int N = 10; // Number of elements in arrays
size_t size = N * sizeof(float);
a_h = (float *)malloc(size); // Allocate array on host
cudaMalloc((void **) &a_d, size); // Allocate array on device
// Initialize host array and copy it to CUDA device
for (int i=0; i<N; i++) a_h[i] = (float)i;
cudaMemcpy(a_d, a_h, size, cudaMemcpyHostToDevice);
// Do calculation on device:
int block_size = 4;
int n_blocks = N/block_size + (N%block_size == 0 ? 0:1);
square_array <<< n_blocks, block_size >>> (a_d, N);
// Retrieve result from device and store it in host array
cudaMemcpy(a_h, a_d, sizeof(float)*N, cudaMemcpyDeviceToHost);
// Print results
for (int i=0; i<N; i++) printf("%d %f\n", i, a_h[i]);
For the squaring number application, it seems not to be executing. I compile my code with nvcc application.cu -o application. Also, I executed a simple hello world application on the C2075, and I got Hello Hello, when it should be of course, Hello World.
The exact error code would be of interest here.
Anyway, I can guess what it is - try compiling with [font=“Courier New”]nvcc -arch sm_20 application.cu -o application[/font] .