How to return a scalar from GPU Pass by reference, pass by pointer to __global__ function

Newbie to CUDA, be gentle :)

I have a function to calculate the dot product of 2 vectors. It seems to work, I can cudaMemcopy the data, operate on the data, download the data and verify. So I tried extending it to a function to calculate the magnitude of a vector (square root of a vector dotted with itself), which returns a scalar (float).

No matter what I do, I can’t seem to pass a scalar (float) as a parameter to a global function. I always get the error during link:

error C2664: ‘__device_stub__Z10CUDAmagwrkPfRf’ : cannot convert parameter 2 from ‘float’ to ‘float *’

I have the scalar (float) cudaMalloc’d, but I can’t pass by reference. Attempting to pass as a pointer fails at compile time.

Can someone show me what I’m missing?



Can you please show the code ?




Here is pass-by-reference, I don’t know that what I’m doing is 100% correct at the moment I’m just concerned about passing the float dd:


void CUDAmagreduce( float* a, float & c )


__shared__ float result;

int index = threadIdx.x;

c = a[index] * a[index];


result += c;

c = result;


int main()


float *cd, dd;

const int size = 100*sizeof(float);

const int singlesize = 1 * sizeof(float);

cudaMalloc( (void**)&cd, size );

cudaMalloc( (void**)&dd, singlesize );

// create data in tensora, tensorb … omitted.

cudaMemcpy( ad,, size, cudaMemcpyHostToDevice );

cudaMemcpy( bd,, size, cudaMemcpyHostToDevice );

CUDAmagreduce<<<1, 100>>>(cd, dd);

// fails above rest of code irrelevant


Failure is in the linker on the call of CUDAmagreduce<<<,>>>(), saying it can’t convert parameter 2 float to float*



I’m not sure why this results in a linker error, but internally a reference is really a pointer. So what you are doing wouldn’t work even if you could compile it. The device cannot write to a host pointer. If you want to read back values from the device, you need to allocate them with cudaMalloc and copy them back with cudaMemcpy. Alternatively, you can define a global device variable and copy back with cudaMemcpyFromSymbol, but global variables are evil so I wouldn’t recommend it.

Your second parameter is declared as a (reference to a) float, but you’re passing it a pointer to a float.

Don’t try to be fancy, just declare and pass both as pointers, and you’ll be fine.

OK, passing as pointers:


void CUDAmag( float * dd )


*dd += (float)100.0;


int main()


float d = 9;

float dd;

size_t size = 1 * sizeof(float);

cudaMalloc( (void**)&dd, size );

cudaMemcpy( &d, &dd, size, cudaMemcpyHostToDevice );

CUDAmag<<<1, 100>>>(&dd);

cudaMemcpy( &d, &dd, size, cudaMemcpyDeviceToHost );

printf("sum: %f\n", d);

printf("sum: %f",  dd);

return 0;


d = 9, dd = 10,000. Memcpy fails. What am I doing wrong?




A float is not a float pointer.

1 Like