struct of arrays for parameters between host and device

search1 · August 25, 2009, 5:54pm

I have a problem with cuda using struct of arrays.

Here is an example:

typedef struct A {
int b[10];
} A;

global changeD(A* a){
int thx = threadIdx.x;
(a->b)[thx] = 5;
}

int main(){
A* a = (A*)malloc(sizeof(A));

A* aD;
cudaMalloc((void**)&aD, sizeof(A));
changeD<<<1, 10>>>(aD);
cudaMemcpy(a, aD, sizeof(A), cudaMemcpyDeviceToHost);

}

The problem is that a is not changed to 5. How to handle this kind of struct parameters? Is it that CUDA does not support more than one level indirection? Because here in the structure, there is another array which can be thought as a pointer. I check some examples in the CUDA SDK, it seems that only one level indirection for pointers is used.

Another interesting thing is that the structure can be used to read, e.g., in c=(a->b)[thx], c can get the value correctly. but write into (a->b)[thx] can not achieved. In other words, you can get the old value of (a->b)[thx], but you can not change it, although the code can pass the compilation. Any idea is appreciated.

mcleary · August 26, 2009, 4:58pm

I have a problem with cuda using struct of arrays.

Here is an example:

typedef struct A {

int b[10];

} A;

global changeD(A* a){
int thx = threadIdx.x;

(a->b)[thx] = 5;
}

int main(){
A* a = (A*)malloc(sizeof(A));
A* aD;
cudaMalloc((void**)&aD, sizeof(A));

changeD<<<1, 10>>>(aD);

cudaMemcpy(a, aD, sizeof(A), cudaMemcpyDeviceToHost);
}

The problem is that a is not changed to 5. How to handle this kind of struct parameters? Is it that CUDA does not support more than one level indirection? Because here in the structure, there is another array which can be thought as a pointer. I check some examples in the CUDA SDK, it seems that only one level indirection for pointers is used.

Another interesting thing is that the structure can be used to read, e.g., in c=(a->b)[thx], c can get the value correctly. but write into (a->b)[thx] can not achieved. In other words, you can get the old value of (a->b)[thx], but you can not change it, although the code can pass the compilation. Any idea is appreciated.

I think there’s no problem with using structs like this as a kernel parameter. The value of (a->b)[thx] is changed, but you copied your struct to the device, the value was changed in device memory. To get the changed value you need to copy back the struct to be read in the host.

search1 · August 26, 2009, 7:35pm

I copied it back, the last statement: cudaMemcpy(a, aD, sizeof(A), cudaMemcpyDeviceToHost);

I didn’t initialized the aD, but since the result is not 5, it still shows that the writing fails.

mcleary · August 26, 2009, 10:41pm

Hi search,

I took your code and executed in my machine. I just add cutilSafeCall to avoid any error. The code works here. Try this on your machine.

#include <stdlib.h>

#include <iostream>

#include <cutil_inline.h>

#define N 10

using namespace std;

typedef struct A {

	int b[N];

} A;

__global__ void changeD(A* a) {

	(a->b)[threadIdx.x] = 50;

}

int main() {

	A* a = (A*)malloc(sizeof(A));

	A* aD;

	cutilSafeCall( cudaMalloc( (void**) &aD, sizeof(A) ) );

	dim3 _grid( 1, 1, 1 );

	dim3 _threads( N, 1, 1 );

	changeD<<< _grid, _threads >>>(aD);

	cutilCheckMsg( "Kernel execution failed" );

	cutilSafeCall( cudaMemcpy(a, aD, sizeof(A), cudaMemcpyDeviceToHost) );

	for( int i = 0; i < N; i++ ) {

		cout << "a[" << i << "] = " << a->b[i] << endl;

	}

	cutilSafeCall( cudaFree( aD ) );

	free(a);

}

The exit of this code were:

a[0] = 50

a[1] = 50

a[2] = 50

a[3] = 50

a[4] = 50

a[5] = 50

a[6] = 50

a[7] = 50

a[8] = 50

a[9] = 50

search1 · August 28, 2009, 2:37pm

Hi search,

I took your code and executed in my machine. I just add cutilSafeCall to avoid any error. The code works here. Try this on your machine.

#include <stdlib.h>

#include <iostream>

#include <cutil_inline.h>

#define N 10

using namespace std;

typedef struct A {

	int b[N];

} A;

__global__ void changeD(A* a) {

	(a->b)[threadIdx.x] = 50;

}

int main() {

	A* a = (A*)malloc(sizeof(A));

	A* aD;

	cutilSafeCall( cudaMalloc( (void**) &aD, sizeof(A) ) );

	dim3 _grid( 1, 1, 1 );

	dim3 _threads( N, 1, 1 );

	changeD<<< _grid, _threads >>>(aD);

	cutilCheckMsg( "Kernel execution failed" );

	cutilSafeCall( cudaMemcpy(a, aD, sizeof(A), cudaMemcpyDeviceToHost) );

	for( int i = 0; i < N; i++ ) {

		cout << "a[" << i << "] = " << a->b[i] << endl;

	}

	cutilSafeCall( cudaFree( aD ) );

	free(a);

}

The exit of this code were:

a[0] = 50

a[1] = 50

a[2] = 50

a[3] = 50

a[4] = 50

a[5] = 50

a[6] = 50

a[7] = 50

a[8] = 50

a[9] = 50

Thank you, tluisrs. I have tried your code. It changes!. However, I test the code in my application, it is changed when I don’t comment a following function call after the assignment. However, If I comment that function call, it stays the same. Does it relate to some memory write latency or optimization done by cuda? I will investigate more on this wried behavior.

search1 · September 21, 2009, 5:23pm

Yes. I think there should be no problem using struct of arrays for parameter between host and device.

The mistake I made is in the cudaMemcpy part when copying the data back from device.

suppose aD and a are type of A*, Instead of using

cudaMemcpy(a, aD, sizeof(A), cudaMemcpyDeviceToHost);

I used

cudaMemcpy(a, aD, sizeof(a), cudaMemcpyDeviceToHost);

It uses the wrong size because sizeof(a) is the pointer size in machine, while sizeof(A) is the structure size!

Hope nobody else makes this mistake.

mcleary · September 21, 2009, 6:13pm

Glad you found the problem.

Topic		Replies	Views
handle structure in the array of device CUDA Programming and Performance	4	2343	May 1, 2012
Struct in CUDA can i use this struct in CUDA CUDA Programming and Performance	15	89645	June 26, 2009
How to pass large arguments in CUDA kernels Kernel arguments CUDA Programming and Performance	10	19222	December 18, 2009
Arrays of Structure Allocating memory for array of structures. CUDA Programming and Performance	7	3778	September 24, 2009
Parameters passed to a CUDA kernel exceed 256 bytes. CUDA Programming and Performance	13	7106	September 21, 2009
C Structures CUDA Programming and Performance	1	4666	May 23, 2007
cudaMalloc structure CUDA Programming and Performance	5	8359	July 10, 2008
Passing structures into CUDA kernels CUDA Programming and Performance	9	20449	November 19, 2020
Struct CUDA Programming and Performance	11	1748	June 30, 2016
array of structs CUDA Programming and Performance	2	4036	August 10, 2008

struct of arrays for parameters between host and device

Related topics