handle structure in the array of device

Kr0n1 · April 30, 2012, 12:29pm

Good morning,

I’m starting in cuda. I have a struct “Point” on my problem:

struct Point {
double x, y, z, radius;

Point neighbors [NUM_VIZINHOS];
Point possiveisVizinhos [NUM_POSSIVEISVIZINHOS];
}

I can copy / access / change / return A “point” when sending to a kernel.
But I can not do the same with an array of “points” with the following code.

const int size = nPoints * sizeof (Point);

cudaMalloc ((void **) & arrayPoints_d, size);
cudaMalloc ((void **) & arrayAux_d, nPoints * nPoints * sizeof (int));
cudaMemcpy (arrayPoints_d, arrayPoint, size, cudaMemcpyHostToDevice);
kernel <<< blocks, threads >>> (arrayPoints_d, nPoints, arrayAux_d);
cudaMemcpy (arrayAux, arrayAux_d, nPoints * nPoints * sizeof (int), cudaMemcpyDeviceToHost);

global void kernel (Point * arrayPoints, nPoints int, int * arrayAux) {
x = int + threadIdx.x blockIdx.x blockDim.x *;
double value = 0;

if (x <nPoints) {
for (int j = 0 j <nPoints nPoints *, + + j) {
arrayPoints value = [xâ€‹â€‹]. radius;
arrayAux [j + x * nPoints] = value;
}
}
}

I can not access the data structure in this case above.

Do not repair simple errors in code because I translated the code. I wonder if you can send arryas a struct to a device, and if possible an example of how to allocate / copy / modify / return this array. Thank you!

Gilles_C · April 30, 2012, 12:57pm

Hi,

I don’t quite understand the following:

Is that you just don’t want to know what wrong in your code or is that this code is somehow automatically translated from a different language and you don’t care about having a correct translator?

And yes, you can allocate, send, use and retrieve array of structs on and from the device, basically like this:

myStruct *h_array = (myStruct*)malloc(nbElem * sizeof(myStruct));

cudamalloc(&d_array, nbElem * sizeof(myStruct));

cudaMemcpy(d_array, h_array, nbElem * sizeof(myStruct), cudaMemcpyHostToDevice);

myKernel<<<blocks, threads>>>(d_array, nbElem);

cudaMemcpy(h_array, d_array, nbElem * sizeof(myStruct), cudaMemcpyDeviceToHost);

Kr0n1 · April 30, 2012, 11:58pm

Hi gilles_c,

sorry for the translation! I’ll leave the normal code, since it is only of variables.

After some straightening out, my problem is accessing the data in my array structure within the kernel.

The condition in bold, I do not returns expected. Never returns true, and the file is loaded into the host. Not even when I put a shorter term, such as: “vetorDePontos . Xmin == 2”, the result is not expected. The way I accessed the data is correct? Take a look please! thank you now.

struct Ponto{
double x, y, z, raio;
Ponto* vizinhos[NUM_VIZINHOS];
Ponto* possiveisVizinhos[NUM_POSSIVEISVIZINHOS];
int quantidadeVizinhos, qtdPossiveisVizinhos;
int xmin, xmax, ymin, ymax, zmin, zmax;
int quadranteX, quadranteY, quadranteZ;
};

global void localizaPossiveisVizinhosQuadrante_d(Ponto* vetorDePontos, int nbPontos, int* vetorAux){
int x = threadIdx.x + blockIdx.x * blockDim.x;
int y = threadIdx.y + blockIdx.y * blockDim.y;

while(x < nbPontos){
	for(int j=0; j<nbPontos; ++j){
		if(x != j){
			if( [b]( vetorDePontos[j].quadranteX <= vetorDePontos[x].xmax) && ( vetorDePontos[j].quadranteX >= vetorDePontos[x].xmin ) &&
			    ( vetorDePontos[j].quadranteY <= vetorDePontos[x].ymax) && ( vetorDePontos[j].quadranteY >= vetorDePontos[x].ymin ) &&
			    ( vetorDePontos[j].quadranteZ <= vetorDePontos[x].zmax) && ( vetorDePontos[j].quadranteZ >= vetorDePontos[x].zmin )[/b]  )
			{
				vetorAux[j + x * nbPontos] = 1;
			}
			else
				vetorAux[j + x * nbPontos] = 2;
		}
		else
				vetorAux[j + x * nbPontos] = 3;

	}
	//vetorAux[x] = 1;
	x += blockDim.x * gridDim.x;
}

}

int main(){
adicionaPontos(ptr, “10coordenadas20ComRaioPadrao.txt”);

const int size = nbPontos * sizeof(Ponto);

Ponto* vetorDePontos_d;
    
int *vetorAux_d;
int* vetorAux = new int[nbPontos*nbPontos];

cudaMalloc((void**)&vetorDePontos_d, nbPontos * sizeof(Ponto));
cudaMalloc((void**)&vetorAux_d, nbPontos*nbPontos*sizeof(int));
cudaMemcpy(vetorDePontos_d, vetorDePontos, size, cudaMemcpyHostToDevice );
localizaPossiveisVizinhosQuadrante_d<<<blocks, threads>>>(vetorDePontos_d, nbPontos, vetorAux_d);
	
cudaMemcpy(vetorAux, vetorAux_d, nbPontos*nbPontos*sizeof(int), cudaMemcpyDeviceToHost) ;


cudaFree(vetorDePontos_d);
cudaFree(vetorAux_d);

}

Gilles_C · May 1, 2012, 5:53am

Hi,

From what I can see in your snippet, the code looks OK. However, there are a few potential issues that the lack of corresponding code makes impossible to assess:

[*]No error checking: it might happen that your allocations just fail, or that (and it is the most likely explanation) your kernel fails, either for a some pre-launch error, or during its run. Please add the necessary error checking mechanism.

[*]No main array allocation and initialisation: are you sure you allocated and initialised properly you array “vetorDePontos” on the host side? See especially next point…

[*]Some doggy member pointers: the member pointers “vizinhos” and “possiveisVizinhos” are most likely allocated on the host, pointing to some memory on the host. If you transfer your array of struct like that and try to dereference the memory there, you might get in trouble (or not, depending on the compute capability of the device, which you don’t mention)

[*]No information about thread and block sizes: here again, the sizes of your blocks and grid are a very important pieces of information that are missing to assess the validity of your code. Maybe you ask for too many blocks or too many threads… And see also the next remark.

[*]Some possible race conditions depending on the use of y dimension: your kernel defines the index “y” corresponding to a 2nd dimension index in your thread blocks. But since you only use “x” to index the work in your kernel, all threads of same “y” index will compete to access the same memory areas. In the code snippet you gave, that should only translate into poor performances, but if the code is indeed more complex, that might lead to undefined results.

This is just what comes into my mind as possible issues / explanations to the unwanted behaviour you encounter. Bottom line is: check for errors returned by cuda calls first. In addition, a run through the memchecker and the debugger should give you some useful informations.

HTH

Kr0n1 · May 1, 2012, 3:50pm

Hi Gilles,

I found two mistakes with his tips:

Through the treatment of exceptions found that “const int size = nbPontos * sizeof (Point)” was not a valid argument for cudaMemcpy. I removed the variable and put direct heat, corrected the error.
How you gave me sure that the code was right on the device, I find mistakes in the old code which make the comparisons of results. I found a mistake in passing the parameter.

For now it is. Thanks for your attention, was a great help for this novice; [

Topic		Replies	Views
array of structs CUDA Programming and Performance	2	4040	August 10, 2008
How to pass large arguments in CUDA kernels Kernel arguments CUDA Programming and Performance	10	19241	December 18, 2009
Pointer in "complex" structure CUDA Programming and Performance	4	3262	March 8, 2009
struct of arrays for parameters between host and device CUDA Programming and Performance	6	5176	September 21, 2009
Complex data structures CUDA Programming and Performance	3	4275	April 22, 2008
structs containing pointers CUDA Programming and Performance	1	815	January 14, 2013
Struct in CUDA can i use this struct in CUDA CUDA Programming and Performance	15	89669	June 26, 2009
How to copy a structure of arrays on GPU? CUDA Programming and Performance	7	24979	October 6, 2011
Transfering struct with pointers to device memory Used for variable argument list CUDA Programming and Performance	11	27149	January 19, 2011
Complex structs in CUDA CUDA Programming and Performance	5	983	May 7, 2024

handle structure in the array of device

Related topics