I have successfully installed the CUDA toolkit on a Linux system, but have had problems installing the toolkit on Vista32. The code I have been using to test is a simple ‘hello world’ example, see below:
[codebox]
#include <stdio.h>
#include <cuda.h>
void incrementArrayOnHost(float *a, int N)
{
int i;
for (i=0; i < N; i++) a[i] = a[i]+1.f;
}
global void incrementArrayOnDevice(float *a, int N)
{
int idx = blockIdx.x*blockDim.x + threadIdx.x;
if (idx<N) a[idx] = a[idx]+1.f;
}
int main(void)
{
float *a_h, *b_h; // pointers to host memory
float *a_d; // pointer to device memory
int i, N = 10;
int ngpu;
size_t size = N*sizeof(float);
// allocate arrays on host
a_h = (float *)malloc(size);
b_h = (float *)malloc(size);
// allocate array on device
cudaMalloc((void **) &a_d, size);
// initialization of host data
for (i=0; i<N; i++) a_h[i] = (float)i;
for (i=0; i<N; i++) b_h[i] = 0.0;
//check how many devices
ngpu=-10;
cudaGetDeviceCount(&ngpu);
printf(“ngpu=%d\n”,ngpu);
for (i=0; i<N; i++)
printf("a_h[i]=%f\n",a_h[i]);
printf("\n");
for (i=0; i<N; i++)
printf("b_h[i]=%f\n",b_h[i]);
printf("\n");
// copy data from host to device
cudaMemcpy(a_d, a_h, sizeof(float)*N, cudaMemcpyHostToDevice);
// do calculation on host
incrementArrayOnHost(a_h, N);
// do calculation on device:
// Part 1 of 2. Compute execution configuration
int blockSize = 4;
int nBlocks = N/blockSize + (N%blockSize == 0?0:1);
// Part 2 of 2. Call incrementArrayOnDevice kernel
incrementArrayOnDevice <<< nBlocks, blockSize >>> (a_d, N);
// Retrieve result from device and store in b_h
cudaMemcpy(b_h, a_d, sizeof(float)*N, cudaMemcpyDeviceToHost);
// check results for (i=0; i<N; i++)
printf("a_h[i]=%f\n",a_h[i]);
printf("\n");
for (i=0; i<N; i++)
printf("b_h[i]=%f\n",b_h[i]);
printf("\n");
// cleanup
free(a_h); free(b_h); cudaFree(a_d);
}
[/codebox]
On Linux this runs as intended when compiled as $nvcc -deviceemu filename.cu
I have no compatible hardware, yet compiling $nvcc filename.cu
produces a binary which runs to the end but (understandably) does not execute the kernel call.
However, on Vista32, identical code compiled with the same commands produces different results. Compiling $nvcc filename.cu
once again produces code that runs to the end but does not execute the kernel call, yet $nvcc -deviceemu filename.cu
runs only until the kernel is launched, and then causes a segmentation fault. Running within gdb gave no useful backtrace information, but one diagnostic that may help is that in -deviceemu mode cudaGetDeviceCount(&ngpu) returns ngpu=1 on Linux and ngpu=0 on Vista32(neither machine has a compatible graphics card). Although not included in the code above, cudaMemcpy() operations are correctly emulated on both systems.
I’ve tried both CUDA2.3 and 2.2, both exhibit the same behaviour, I’m not sure whether it is relevant or not, but my Vista32 machine is a laptop with a GeForce Go 7300 (with a recently downloaded driver).
I’d appreciate any help you might be able to give on this topic - I suspect other people have had similar problems, but on trawling through the forum I haven’t found any that are specific and isolated to failure of the -deviceemu option.
Thanks!