Global arrays?

jan.heckman · August 18, 2010, 6:18pm

I would do it like this:

[font=“Courier New”][codebox]include <stdio.h>

include “cutil_inline.h”

define ELEMS(a) (sizeof((a))/sizeof((a)[0]))

int a = {1, 2, 3};

constant int a_c[ELEMS(a)];

char *cudaErrorString=

{

"No errors",

"Missing configuration error",

"Memory allocation error",

"Initialization error",

"Launch failure",

"Prior launch failure",

"Launch timeout error",

"Launch out of resources error",

"Invalid device function",

"Invalid configuration",

"Invalid device",

"Invalid value",

"Invalid pitch value",

"Invalid symbol",

"Map buffer object failed",

"Unmap buffer object failed",

"Invalid host pointer",

"Invalid device pointer",

"Invalid texture",

"Invalid texture binding",

"Invalid channel descriptor",

"Invalid memcpy direction"

};

global void testkernel( int *b, int n )

{

for( int i = 0; i < n; i++ ) b[i] = a_c[i];

}

int main()

{

int b[ ELEMS( a )];

int *b_d;

cudaError_t cerr;

cudaMalloc((void**)&b_d, sizeof(B));

cerr=cudaMemcpyToSymbol("a_c", a, sizeof(a_c), 0, cudaMemcpyHostToDevice);

if( cerr != cudaSuccess ) puts( cudaErrorString[ cerr ]);

testkernel <<< 1, 1 >>> ( b_d, ELEMS(a) );

cutilCheckMsg("Kernel execution failed");

cudaMemcpy( b, b_d, sizeof(B), cudaMemcpyDeviceToHost );

int succes = 1;

for( int i = 0; i < ELEMS(a); i++) { printf( "%d %d %d\n", i, a[i], b[i] ); if( a[i] != b[i] ) succes=0; }

puts( succes ? "Passed" : "Failed" );

}[/codebox][/font]

I’ve used constant memory here. If your arrays do not fit in constant memory, you can put them in global memory, using the normal cudaMalloc() and cudaMemcpy() as used in (all) the SDK examples. I think constant memory is faster.

The array of errorstrings is a bit unnecessary, should have used cudaGetErrorString() as you did.

MrNightLifeLover · August 18, 2010, 6:53pm

Thx… but in your example the size of the array must be known at compile time, I think…

MrNightLifeLover · August 18, 2010, 6:53pm

Thx… but in your example the size of the array must be known at compile time, I think…

jan.heckman · August 18, 2010, 8:53pm

Yes. You can always use (ordinary) global memory on the device which has dynamic allocation. Constant memory does not appear to have dynamic allocation.

I have checked how constant memory works. It’s lifetime is application. When you run two apps initializing constant memory to different values together, the outcome will be ok. This memory must be included in (some) taskswitches.

If you want to use the faster constant memory, you have to do the memorymanagement yourself. Declaring variables and arrays of fixed sizes is a way to do that. You could also declare

[codebox]device constant int constmem[65536/4];[/codebox]

to be able to address all constant memory, using typecasts if you put a different kind of array somewhere in constant memory, implementing a simple kind of dynamic memory allocation. It is a bit farfetched, but as long as you can avoid fragmentation, it would be effective.

For such reasons, using constant memory should probably be a late stage improvement, and it would depend on the kernel whether there is significant speedup compared to having these variables in (ordinary) global memory, which is simpler to program.

Your original question was about ordinary global memory. You can use cudaMemcpyToSymbol() but it is normal to use cudaMemcpy() with pointers, size to copy and a flag for the direction of the copy. Just check the (any) SDK examples, since moving data to and from the GPU is really basic to everything else…

jan.heckman · August 18, 2010, 8:53pm

Yes. You can always use (ordinary) global memory on the device which has dynamic allocation. Constant memory does not appear to have dynamic allocation.

I have checked how constant memory works. It’s lifetime is application. When you run two apps initializing constant memory to different values together, the outcome will be ok. This memory must be included in (some) taskswitches.

If you want to use the faster constant memory, you have to do the memorymanagement yourself. Declaring variables and arrays of fixed sizes is a way to do that. You could also declare

[codebox]device constant int constmem[65536/4];[/codebox]

to be able to address all constant memory, using typecasts if you put a different kind of array somewhere in constant memory, implementing a simple kind of dynamic memory allocation. It is a bit farfetched, but as long as you can avoid fragmentation, it would be effective.

For such reasons, using constant memory should probably be a late stage improvement, and it would depend on the kernel whether there is significant speedup compared to having these variables in (ordinary) global memory, which is simpler to program.

Your original question was about ordinary global memory. You can use cudaMemcpyToSymbol() but it is normal to use cudaMemcpy() with pointers, size to copy and a flag for the direction of the copy. Just check the (any) SDK examples, since moving data to and from the GPU is really basic to everything else…

Topic		Replies	Views
__const__ writting with cudaMemcpyToSymbol CUDA Programming and Performance	15	10997	August 30, 2007
Constant memory when having more than one file external does not work CUDA Programming and Performance	24	3368	August 27, 2010
Constant Array Usage Problem CUDA Programming and Performance	3	1146	July 26, 2009
Can we do malloc inside a __global__ function CUDA Programming and Performance	26	10057	February 21, 2010
How do you copy an array into constant memory? CUDA Programming and Performance	11	15965	December 17, 2010
__constant__ Strangeness CUDA Programming and Performance	5	2556	July 21, 2008
I had a problem on using constant memory CUDA Programming and Performance	4	934	October 13, 2013
How to Copy an Array to the GPU memory CUDA Programming and Performance	4	3981	June 20, 2008
pointer in global device memory CUDA Programming and Performance	9	11744	November 23, 2011
__device__ variables and arrays CUDA Programming and Performance	8	15530	August 16, 2014

Global arrays?

Related topics