How to return a scalar from GPU Pass by reference, pass by pointer to __global__ function

Newbie to CUDA, be gentle :)

I have a function to calculate the dot product of 2 vectors. It seems to work, I can cudaMemcopy the data, operate on the data, download the data and verify. So I tried extending it to a function to calculate the magnitude of a vector (square root of a vector dotted with itself), which returns a scalar (float).

No matter what I do, I can’t seem to pass a scalar (float) as a parameter to a global function. I always get the error during link:

error C2664: ‘__device_stub__Z10CUDAmagwrkPfRf’ : cannot convert parameter 2 from ‘float’ to ‘float *’

I have the scalar (float) cudaMalloc’d, but I can’t pass by reference. Attempting to pass as a pointer fails at compile time.

Can someone show me what I’m missing?

Thanks,

Philip

Can you please show the code ?

Thanks,

Nittin

Nittin,

Here is pass-by-reference, I don’t know that what I’m doing is 100% correct at the moment I’m just concerned about passing the float dd:

global

void CUDAmagreduce( float* a, float & c )

{

__shared__ float result;

int index = threadIdx.x;

c = a[index] * a[index];

__syncthreads();

result += c;

c = result;

}

int main()

{

float *cd, dd;

const int size = 100*sizeof(float);

const int singlesize = 1 * sizeof(float);

cudaMalloc( (void**)&cd, size );

cudaMalloc( (void**)&dd, singlesize );

// create data in tensora, tensorb … omitted.

cudaMemcpy( ad, tensora.data, size, cudaMemcpyHostToDevice );

cudaMemcpy( bd, tensorb.data, size, cudaMemcpyHostToDevice );

CUDAmagreduce<<<1, 100>>>(cd, dd);

// fails above rest of code irrelevant

}

Failure is in the linker on the call of CUDAmagreduce<<<,>>>(), saying it can’t convert parameter 2 float to float*

Thanks,

Philip

I’m not sure why this results in a linker error, but internally a reference is really a pointer. So what you are doing wouldn’t work even if you could compile it. The device cannot write to a host pointer. If you want to read back values from the device, you need to allocate them with cudaMalloc and copy them back with cudaMemcpy. Alternatively, you can define a global device variable and copy back with cudaMemcpyFromSymbol, but global variables are evil so I wouldn’t recommend it.

Your second parameter is declared as a (reference to a) float, but you’re passing it a pointer to a float.

Don’t try to be fancy, just declare and pass both as pointers, and you’ll be fine.

OK, passing as pointers:

global

void CUDAmag( float * dd )

{

*dd += (float)100.0;

}

int main()

{

float d = 9;

float dd;

size_t size = 1 * sizeof(float);

cudaMalloc( (void**)&dd, size );

cudaMemcpy( &d, &dd, size, cudaMemcpyHostToDevice );

CUDAmag<<<1, 100>>>(&dd);

cudaMemcpy( &d, &dd, size, cudaMemcpyDeviceToHost );

printf("sum: %f\n", d);

printf("sum: %f",  dd);



return 0;

}

d = 9, dd = 10,000. Memcpy fails. What am I doing wrong?

Thanks,

Philip

This.

A float is not a float pointer.

1 Like