How to copy a single variable from device to host?

This is what I want to do:

  1. declare a device variable:
    device float a;

  2. write a kernel with the last value to be returned
    global kernerl()

    a= sometthing;

  3. in the main calling program

//declare a host variable

float b;
kernel<<< …>>> ();

// try to get the variable back to host memory

cudaMemcpy ( b, a, sizeof(float) , cudaMemcpyDeviceToHost ); --> this does not work
cudaMemcpyFromSymbol( b , a , sizeof(float), 0, cudaMemcpyDeviceToHost); --> this does not work

b = a; --> this does not work

What else is there?

I think that you have to cudaMalloc space for the variable (it would be a 1-element array, effectively). I don’t think (perhaps I’m wrong…) that you can have device variables with file scope.

Ooops, I just had a wrong printf arguments looking at the wrong variables. Yes you can have device variable file scope and you don’t have to cudaMalloc a single variable.

Now everything works.