device array to global Cant pass a device array to global

jenniffer_tm · February 22, 2012, 10:21pm

hi, i want to know, why when I try to pass two arrays from a device function, to a global, trhough the function square(), my code doesnÂ´t work, but if i commented this function the program works perfect…

there is the code, the thing is, when i do that with a simple int or float element, it works too, the problem is only with the x and y array.

#include <math.h>

#include "cuda_runtime.h"

#include "device_launch_parameters.h"

// global device variable

__device__ int N=4 ;

__device__ int sum = 0 ;

__device__ float *x ;

__device__ float *y ;

// Device function

__device__ float square ()

{

	// var locals 

	double pi= 3.1416;

	double deltaphi=0.0,r=0.0,theta=0.0,angulo_deseado=0.0;

	int i=0;

	//------------

	 N=4;

	

	switch (N)

	{

	case 4:

		theta = 0.7853;

		r=0.5; 

		break;

	case 12:

		theta=75;

		theta=1.3089;

		r=1.93; 

		break;

	case 20:

		theta=81;

		theta=1.4137;

		r=2.97; 

		break;

	default:

		break;

	}

	

		for(i=0; i <N; i++){

		deltaphi=2*pi/(i+1);

		angulo_deseado = deltaphi*i + (pi/4);

		x[i]=r*cos(angulo_deseado);

		y[i]=r*sin(angulo_deseado);

	}

	

return 0;

}

// k e rn e l f unc t i on âˆ’ accumulator

#define E 10

__global__ void kernel( float *c){

square ();

while(sum < E){

	c[sum]= x[sum];

	sum++; }

		}

int main(void){

float c[E],*dev_c;

cudaMalloc((int**)&dev_c, E*sizeof(int));

//cudaMalloc((int**)&y, E*sizeof(int));

cudaMemcpy(dev_c, c, E*sizeof(int), cudaMemcpyHostToDevice);

kernel<<<1,E>>>(dev_c );

cudaMemcpy(c, dev_c, E*sizeof(int), cudaMemcpyDeviceToHost);

for(int i=0; i<E; i++){

printf("%f\n",  c[i]);

}

cudaFree(dev_c);

return 0;}

I appreciate all your suggestions, and forgive my bad english, im just a colombian chick tryig to understand CUDA world.

regards

Jenn T.

tera · February 22, 2012, 11:26pm

The problem is that [font=“Courier New”]x[/font] and [font=“Courier New”]y[/font] are not initialized, so writes to the arrays to random addresses in memory.

The commented out call to [font=“Courier New”]cudaMalloc()[/font] indicates that you were (partly) aware of the problem. That call cannot work however, because [font=“Courier New”]cudaMalloc()[/font] takes a host pointer (to another pointer) as first argument, but [font=“Courier New”]&y[/font] would be a device pointer (actually addresses of device variables cannot even be taken like that in host code, you would need to use [font=“Courier New”]cudaGetSymbolAddress()[/font] instead). The correct way of allocating a variable size array on the device is to first call [font=“Courier New”]cudaMalloc()[/font] with a pointer variable in host memory, and then either copy that to the device or use it as an argument to the kernel.

There is another problem in the code above that the loop

while(sum < E) {

        c[sum]= x[sum];

        sum++;

}

with [font=“Courier New”]sum[/font] as a global variable happens to work purely through undefined behavior that you cannot rely on. Use local (automatic) variables for loop counters instead. In this example, you likely don’t want a loop at all, just have each thread do a different piece of work:

int idx = threadIdx.x + blockDim.x * blockIdx.x;

if (idx < E) {

        c[idx]= x[idx];

}

jenniffer_tm · February 23, 2012, 4:07pm

The problem is that [font=“Courier New”]x[/font] and [font=“Courier New”]y[/font] are not initialized, so writes to the arrays to random addresses in memory.

The commented out call to [font=“Courier New”]cudaMalloc()[/font] indicates that you were (partly) aware of the problem. That call cannot work however, because [font=“Courier New”]cudaMalloc()[/font] takes a host pointer (to another pointer) as first argument, but [font=“Courier New”]&y[/font] would be a device pointer (actually addresses of device variables cannot even be taken like that in host code, you would need to use [font=“Courier New”]cudaGetSymbolAddress()[/font] instead). The correct way of allocating a variable size array on the device is to first call [font=“Courier New”]cudaMalloc()[/font] with a pointer variable in host memory, and then either copy that to the device or use it as an argument to the kernel.

There is another problem in the code above that the loop
while(sum < E) {

        c[sum]= x[sum];

        sum++;

}
with [font=“Courier New”]sum[/font] as a global variable happens to work purely through undefined behavior that you cannot rely on. Use local (automatic) variables for loop counters instead. In this example, you likely don’t want a loop at all, just have each thread do a different piece of work:
int idx = threadIdx.x + blockDim.x * blockIdx.x;

if (idx < E) {

        c[idx]= x[idx];

}

thank you so much for your qick answer, however the problem seems to remain, I just want to aviod the cudaMalloc call, due to time issues,I thought there would be a way to do it without calling cudaMalloc,and create the data array directly on the device, Is there any way?

tera · February 24, 2012, 10:26am

For a fixed size array, just declare the array in global memory and use [font=“Courier New”]cudaMemcpyToSymbol()[/font] / [font=“Courier New”]cudaMemcpyFromSymbol()[/font] to access it. For a variable size array there is no way around some kind of memory allocation call.

Topic		Replies	Views
__device__ variables and arrays CUDA Programming and Performance	8	15117	August 16, 2014
Global arrays? CUDA Programming and Performance	24	10611	August 18, 2010
How can I call the global function when I use a 3d array CUDA Programming and Performance cuda	12	593	October 12, 2021
Problem in passing an integer array to global function and device function CUDA Programming and Performance	2	1560	June 11, 2014
Allocating a multidimensional array onto a device variable CUDA Programming and Performance	6	1585	July 15, 2015
Problems with creating an array of Cuda pointers CUDA Programming and Performance	7	13533	April 20, 2009
how to use global device struct variables in device functions CUDA Programming and Performance	4	9040	May 19, 2011
passing an array to a kenel ? CUDA Programming and Performance	9	13061	June 10, 2009
Invalid Device Pointer CUDA Programming and Performance	9	24456	January 15, 2009
Passing a multidimensional array to kernel how to allocate space in host and pass to device? CUDA Programming and Performance	12	16164	November 22, 2014

__device__ array to __global__ Cant pass a __device__ array to __global__

Related topics

device array to global Cant pass a device array to global