Global arrays?

I would do it like this:

[font=“Courier New”][codebox]include <stdio.h>

include “cutil_inline.h”

define ELEMS(a) (sizeof((a))/sizeof((a)[0]))

int a = {1, 2, 3};

constant int a_c[ELEMS(a)];

char *cudaErrorString=

{

"No errors",

"Missing configuration error",

"Memory allocation error",

"Initialization error",

"Launch failure",

"Prior launch failure",

"Launch timeout error",

"Launch out of resources error",

"Invalid device function",

"Invalid configuration",

"Invalid device",

"Invalid value",

"Invalid pitch value",

"Invalid symbol",

"Map buffer object failed",

"Unmap buffer object failed",

"Invalid host pointer",

"Invalid device pointer",

"Invalid texture",

"Invalid texture binding",

"Invalid channel descriptor",

"Invalid memcpy direction"

};

global void testkernel( int *b, int n )

{

for( int i = 0; i < n; i++ ) b[i] = a_c[i];

}

int main()

{

int b[ ELEMS( a )];

int *b_d;

cudaError_t cerr;

cudaMalloc((void**)&b_d, sizeof(B));

cerr=cudaMemcpyToSymbol("a_c", a, sizeof(a_c), 0, cudaMemcpyHostToDevice);

if( cerr != cudaSuccess ) puts( cudaErrorString[ cerr ]);

testkernel <<< 1, 1 >>> ( b_d, ELEMS(a) );

cutilCheckMsg("Kernel execution failed");

cudaMemcpy( b, b_d, sizeof(B), cudaMemcpyDeviceToHost );

int succes = 1;

for( int i = 0; i < ELEMS(a); i++) { printf( "%d %d %d\n", i, a[i], b[i] ); if( a[i] != b[i] ) succes=0; }

puts( succes ? "Passed" : "Failed" );

}[/codebox][/font]

I’ve used constant memory here. If your arrays do not fit in constant memory, you can put them in global memory, using the normal cudaMalloc() and cudaMemcpy() as used in (all) the SDK examples. I think constant memory is faster.

The array of errorstrings is a bit unnecessary, should have used cudaGetErrorString() as you did.

Thx… but in your example the size of the array must be known at compile time, I think…

Thx… but in your example the size of the array must be known at compile time, I think…

Yes. You can always use (ordinary) global memory on the device which has dynamic allocation. Constant memory does not appear to have dynamic allocation.

I have checked how constant memory works. It’s lifetime is application. When you run two apps initializing constant memory to different values together, the outcome will be ok. This memory must be included in (some) taskswitches.

If you want to use the faster constant memory, you have to do the memorymanagement yourself. Declaring variables and arrays of fixed sizes is a way to do that. You could also declare

[codebox]device constant int constmem[65536/4];[/codebox]

to be able to address all constant memory, using typecasts if you put a different kind of array somewhere in constant memory, implementing a simple kind of dynamic memory allocation. It is a bit farfetched, but as long as you can avoid fragmentation, it would be effective.

For such reasons, using constant memory should probably be a late stage improvement, and it would depend on the kernel whether there is significant speedup compared to having these variables in (ordinary) global memory, which is simpler to program.

Your original question was about ordinary global memory. You can use cudaMemcpyToSymbol() but it is normal to use cudaMemcpy() with pointers, size to copy and a flag for the direction of the copy. Just check the (any) SDK examples, since moving data to and from the GPU is really basic to everything else…

Yes. You can always use (ordinary) global memory on the device which has dynamic allocation. Constant memory does not appear to have dynamic allocation.

I have checked how constant memory works. It’s lifetime is application. When you run two apps initializing constant memory to different values together, the outcome will be ok. This memory must be included in (some) taskswitches.

If you want to use the faster constant memory, you have to do the memorymanagement yourself. Declaring variables and arrays of fixed sizes is a way to do that. You could also declare

[codebox]device constant int constmem[65536/4];[/codebox]

to be able to address all constant memory, using typecasts if you put a different kind of array somewhere in constant memory, implementing a simple kind of dynamic memory allocation. It is a bit farfetched, but as long as you can avoid fragmentation, it would be effective.

For such reasons, using constant memory should probably be a late stage improvement, and it would depend on the kernel whether there is significant speedup compared to having these variables in (ordinary) global memory, which is simpler to program.

Your original question was about ordinary global memory. You can use cudaMemcpyToSymbol() but it is normal to use cudaMemcpy() with pointers, size to copy and a flag for the direction of the copy. Just check the (any) SDK examples, since moving data to and from the GPU is really basic to everything else…