My first and simple kernel doesn't run in emulation. I need an explanation why my kernel cannot


I want to start developing with CUDA and thought it might be a good idea to write a small program (see below). The purpose of the program is to allocate a float variable on host-side and to modify it from within a CUDA kernel. Therefore I allocate a float on device side, copy the host value to the device value and pass the device memory to the kernel invocation. After calling the kernel, I copy the value from the device to the host.

Everything compiles fine, but when running the program, I get an error at the kernel invocation call (see below the code).

#include <stdlib.h>

#include <stdio.h>

#include <cuda_runtime.h>

void checkCUDAError(const char *msg)


	cudaError_t err = cudaGetLastError();

	if( cudaSuccess != err)


		fprintf(stderr, "Error while %s: %s.\n", msg, cudaGetErrorString( err) );




__global__ void myFirstKernel(float * ptr)


	*ptr = 333;


int main(int argc, char * argv[])


	float * h = (float*) malloc(sizeof(float));

	float * d;

	*h = 332.0f;


	cudaMalloc((void**)&d, sizeof(float));	

	checkCUDAError("allocating device memory");	


	cudaMemcpy(d, h, sizeof(float), cudaMemcpyHostToDevice);

	checkCUDAError("copying from host to device");


	/* Output the unmodified result. */

	printf("Unmodified: %f\n", *h);

	/* Modify host value "h" by kernel. */

	myFirstKernel<<< 2, 2 >>>(d);	

	checkCUDAError("calling kernel");	


	checkCUDAError("synchronizing threads");	

	/* Copy modified data from device to host. */

	cudaMemcpy(h, d, sizeof(float), cudaMemcpyDeviceToHost);

	checkCUDAError("copying from device to host");

	/* Output the result modified by the kernel */

	printf("Modified: %f\n", *h);

	/* Free memory */


	checkCUDAError("freeing device memory");





I only have a GeForceGo7400 so I need to run in Emulation mode. Here is the compilation output on my Vista machine. As you can see, The kernel invocation fails:

C:\Users\kwk\Desktop\test>nvcc -deviceemu && a.exe



Unmodified: 332.000000

Error while calling kernel: unknown error.


Works fine for me, when I start with the template SDK project. I couldn’t compile it with your nvcc command. (Either cl.exe is not found, or, if I do it from the VS command prompt, a bunch of linker errors).

Thank you Alex!

I compiled the code in Linux and everything works fine for me, too. I think I stick to Linux until CUDA 2.1 is no longer a beta version. Then I’ll use VC2008 and cross my fingers :)