How to use cudaMemcpyFromSymbol with global device variable?

Hello,

I declared a global device variable like this:

__device__ float m_dev_minimum_global;

I initilize it like this:

float m_minimum_global = MAX_FLOAT;
cudaMemcpyToSymbol(m_dev_minimum_global, &m_minimum_global, sizeof(float));

I then run a kernel, which makes use of the variable and writes new values to it.
Now I just want to copy the value inside the variable to a host variable from host code.

This is how I last tried to use it:

float *host_distance_gpu;
host_distance_gpu = (float*)malloc(sizeof(float));
cudaMemcpyFromSymbol(host_distance_gpu, m_dev_minimum_global, sizeof(float));

I also tried around using a non-pointer float, but the host variable always ends up as 0.0000.
I know the device variable holds the correct value, because I currently use a kernel to just copy to a normal device pointer per cudaMemcpy.

Can someone explain to me why this doesn’t work?
Also, when debugging, the instruction can not be stepped into, but it returns cudaSucess.

bump