a difference between emu and not

I have built and application in CUDA, only testing it in emulation mode. Now that I want to make it in non-emulation mode it is failing. The problem is that in non-emulation mode i cant seem to access my kernel.

This code is an illustration of my problem.

__global__ void d_findPivotColumn(int*out){

        *out = 10;


int main(){

        int out = 5;



        return 0;


The problem is that in emulation mode, the output is 10. In non-emulation mode the output is 5.

Does anyone have any hints why this is occuring? What am i missing?

really often mentioned in this forum:

You’ll have to work your kernels on device memory. what you are actually doing is initializing an int (therefore residing in host mem) and trying to run a kernel on the gpu which has no access to the host memory. Therefore your value of out remains unchanged.

That this is working in emu mode is bad and the compiler should be able to print out a warning.

Here’s how it should be done:

__global__ void d_findPivotColumn(int*out){

 Â  Â  Â  Â *out = 10;


int main(){

 Â  Â  Â  Â int out = 5;

 Â  Â  Â  Â int* d_out; Â  Â  Â  Â  // pointer to int on device

 Â  Â  Â  Â cudaMalloc((void**)&d_out, sizeof(int)); Â  Â //allocate memory for 1 int

 Â  Â  Â  Â cudaMemcpy(d_out, out, sizeof(int), cudaMemcpyHostToDevice);

 Â  Â  Â  Â d_findPivotColumn<<<1,1>>>(&d_out);

 Â  Â  Â  Â cudaMemcpy(out, d_out, sizeof(int), cudaMemcpyDeviceToHost);

 Â  Â  Â  Â printf("out=%i\n",out);

 Â  Â  Â  Â return 0;



Thanks… that helps heaps :thumbup: