My first and simple kernel doesn't run in emulation. I need an explanation why my kernel cannot

Hello,

I want to start developing with CUDA and thought it might be a good idea to write a small program (see below). The purpose of the program is to allocate a float variable on host-side and to modify it from within a CUDA kernel. Therefore I allocate a float on device side, copy the host value to the device value and pass the device memory to the kernel invocation. After calling the kernel, I copy the value from the device to the host.

Everything compiles fine, but when running the program, I get an error at the kernel invocation call (see below the code).

#include <stdlib.h>

#include <stdio.h>

#include <cuda_runtime.h>

void checkCUDAError(const char *msg)

{

	cudaError_t err = cudaGetLastError();

	if( cudaSuccess != err)

	{

		fprintf(stderr, "Error while %s: %s.\n", msg, cudaGetErrorString( err) );

		exit(-1);

	}

}

__global__ void myFirstKernel(float * ptr)

{

	*ptr = 333;

};

int main(int argc, char * argv[])

{

	float * h = (float*) malloc(sizeof(float));

	float * d;

	*h = 332.0f;

	

	cudaMalloc((void**)&d, sizeof(float));	

	checkCUDAError("allocating device memory");	

	

	cudaMemcpy(d, h, sizeof(float), cudaMemcpyHostToDevice);

	checkCUDAError("copying from host to device");

	

	/* Output the unmodified result. */

	printf("Unmodified: %f\n", *h);

	/* Modify host value "h" by kernel. */

	myFirstKernel<<< 2, 2 >>>(d);	

	checkCUDAError("calling kernel");	

	cudaThreadSynchronize();	

	checkCUDAError("synchronizing threads");	

	/* Copy modified data from device to host. */

	cudaMemcpy(h, d, sizeof(float), cudaMemcpyDeviceToHost);

	checkCUDAError("copying from device to host");

	/* Output the result modified by the kernel */

	printf("Modified: %f\n", *h);

	/* Free memory */

	cudaFree(d);

	checkCUDAError("freeing device memory");

	free(h);

	

	return EXIT_SUCCESS;

}

I only have a GeForceGo7400 so I need to run in Emulation mode. Here is the compilation output on my Vista machine. As you can see, The kernel invocation fails:

C:\Users\kwk\Desktop\test>nvcc -deviceemu net.cu && a.exe

net.cu

tmpxft_00001758_00000000-3_net.cudafe1.cpp

tmpxft_00001758_00000000-7_net.ii

Unmodified: 332.000000

Error while calling kernel: unknown error.

C:\Users\kwk\Desktop\test>

Works fine for me, when I start with the template SDK project. I couldn’t compile it with your nvcc command. (Either cl.exe is not found, or, if I do it from the VS command prompt, a bunch of linker errors).

Thank you Alex!

I compiled the code in Linux and everything works fine for me, too. I think I stick to Linux until CUDA 2.1 is no longer a beta version. Then I’ll use VC2008 and cross my fingers :)