compiling for emu vs. device compile targets

In my project, I’ve created a kernel function which accesses a global pointer variable pointing to memory created with malloc(). I’m finding that this code compiles fine for for the Emu targets but does not compile for device targets. Can someone confirm that this is the case, and where in the CUDA programming manual says this?



Here are the global variables I have declared:

// global vars

char *global_var;

the kernel:

/** kernel code **/

__global__ void Kernel(int n)


	int x = threadIdx.x;

	int y = threadIdx.y;

	int dimx = blockDim.x;

	global_var[y * dimx + x] = n;


And main:

int main(int argc, char** argv) 



	global_var = (char*)malloc(16 * 16 * sizeof(char));

	dim3 block(16, 16, 1);	// block size

	dim3 grid(1,1);  	// grid size


	Kernel <<< grid, block >>> (233);


Notice that in the code the kernel accesses global pointer global_var and that compiling this with --emudevice is fine. But compiling for real hardware will error with something like:


"", line 21: error: identifier "global_var" is undefined

  (global_var[(y * dimx) + x]) = ((char)n); 


You cannot access memory on CPU from GPU, you have to use CudaMalloc