-deviceemu crashes on Vista32

I have successfully installed the CUDA toolkit on a Linux system, but have had problems installing the toolkit on Vista32. The code I have been using to test is a simple ‘hello world’ example, see below:

[codebox]

#include <stdio.h>

#include <cuda.h>

void incrementArrayOnHost(float *a, int N)

{

int i;

for (i=0; i < N; i++) a[i] = a[i]+1.f;

}

global void incrementArrayOnDevice(float *a, int N)

{

int idx = blockIdx.x*blockDim.x + threadIdx.x;

if (idx<N) a[idx] = a[idx]+1.f;

}

int main(void)

{

float *a_h, *b_h; // pointers to host memory

float *a_d; // pointer to device memory

int i, N = 10;

int ngpu;

size_t size = N*sizeof(float);

// allocate arrays on host

a_h = (float *)malloc(size);

b_h = (float *)malloc(size);

// allocate array on device

cudaMalloc((void **) &a_d, size);

// initialization of host data

for (i=0; i<N; i++) a_h[i] = (float)i;

for (i=0; i<N; i++) b_h[i] = 0.0;

//check how many devices

ngpu=-10;

cudaGetDeviceCount(&ngpu);

printf(“ngpu=%d\n”,ngpu);

for (i=0; i<N; i++)

printf("a_h[i]=%f\n",a_h[i]);

printf("\n");

for (i=0; i<N; i++)

printf("b_h[i]=%f\n",b_h[i]);

printf("\n");

// copy data from host to device

cudaMemcpy(a_d, a_h, sizeof(float)*N, cudaMemcpyHostToDevice);

// do calculation on host

incrementArrayOnHost(a_h, N);

// do calculation on device:

// Part 1 of 2. Compute execution configuration

int blockSize = 4;

int nBlocks = N/blockSize + (N%blockSize == 0?0:1);

// Part 2 of 2. Call incrementArrayOnDevice kernel

incrementArrayOnDevice <<< nBlocks, blockSize >>> (a_d, N);

// Retrieve result from device and store in b_h

cudaMemcpy(b_h, a_d, sizeof(float)*N, cudaMemcpyDeviceToHost);

// check results for (i=0; i<N; i++)

printf("a_h[i]=%f\n",a_h[i]);

printf("\n");

for (i=0; i<N; i++)

printf("b_h[i]=%f\n",b_h[i]);

printf("\n");

// cleanup

free(a_h); free(b_h); cudaFree(a_d);

}

[/codebox]

On Linux this runs as intended when compiled as $nvcc -deviceemu filename.cu

I have no compatible hardware, yet compiling $nvcc filename.cu

produces a binary which runs to the end but (understandably) does not execute the kernel call.

However, on Vista32, identical code compiled with the same commands produces different results. Compiling $nvcc filename.cu

once again produces code that runs to the end but does not execute the kernel call, yet $nvcc -deviceemu filename.cu

runs only until the kernel is launched, and then causes a segmentation fault. Running within gdb gave no useful backtrace information, but one diagnostic that may help is that in -deviceemu mode cudaGetDeviceCount(&ngpu) returns ngpu=1 on Linux and ngpu=0 on Vista32(neither machine has a compatible graphics card). Although not included in the code above, cudaMemcpy() operations are correctly emulated on both systems.

I’ve tried both CUDA2.3 and 2.2, both exhibit the same behaviour, I’m not sure whether it is relevant or not, but my Vista32 machine is a laptop with a GeForce Go 7300 (with a recently downloaded driver).

I’d appreciate any help you might be able to give on this topic - I suspect other people have had similar problems, but on trawling through the forum I haven’t found any that are specific and isolated to failure of the -deviceemu option.

Thanks!