passing an array to a kenel ?

NCC-1701D · June 8, 2009, 7:58am

I want to pass a small array (of integers), max of up to 10 values… to my cuda kernel from the host file.

How can I do that without having to create a device pointer and doing a memcpy to copy the contents of the global array to the device array adn then using this in the kernel ??

What i want to do actually is:

[codebox]Host code

…

int h_arr[8] = {0, 1 , 3, 4, 5, 6, 7}

…

my_kernel<<<grid, threads>>> (d_in, d_out, h_arr, n)

…

Device code

global void my_kernel (int* in, int* out, int arr , int n)

{

…

for (i = 0; i <n; i++)

 dev_func(in, out, arr);

…

}

device dev_func (int arr)

{

int x = threadIdx.x;

val = (x%8);

out = arr [val] * in;

}[/codebox]

what would be the best way to go about doing this ?? (d_in and d_out are device pointers with appropriate space allocated for them on the device global memory !!)

NCC-1701D · June 8, 2009, 8:05am

Even if I copy this small array onto the device global memory (using a memcpy…)

then copy the contents of this array from the global memory onto the shared memory and then pass the shared memory array to the device function…

would it make more sense to this instead ??..is it possible to do this ?? how would I go abt doing this ??? (especially passing the shared memory array to the device function…)
any help would be great…

thanks in advance…

eyalhir74 · June 8, 2009, 8:22am

I want to pass a small array (of integers), max of up to 10 values… to my cuda kernel from the host file.

How can I do that without having to create a device pointer and doing a memcpy to copy the contents of the global array to the device array adn then using this in the kernel ??

What i want to do actually is:

[codebox]Host code

…

…

…

int h_arr[8] = {0, 1 , 3, 4, 5, 6, 7}

…

…

…

my_kernel<<<grid, threads>>> (d_in, d_out, h_arr, n)

…

…

Device code

global void my_kernel (int* in, int* out, int arr , int n)

{

…

…

for (i = 0; i <n; i++)
 dev_func(in, out, arr);
…

…

…

}

device dev_func (int arr)

{

int x = threadIdx.x;

val = (x%8);

out = arr [val] * in;

}[/codebox]

what would be the best way to go about doing this ?? (d_in and d_out are device pointers with appropriate space allocated for them on the device global memory !!)

Use constant memory instead. Look for cudaMemcpyToSymbol in the SDK:

__constant__ int	ProjectIntParams[ PROJECT_INT_PARAMETERS_COUNT ];

cudaMemcpyToSymbol( ProjectIntParams, &constHostIntParams[0], PROJECT_INT_PARAMETERS_COUNT * sizeof( int ), 0 );

eyal

NCC-1701D · June 8, 2009, 8:38am

Use constant memory instead. Look for cudaMemcpyToSymbol in the SDK:

__constant__ int	ProjectIntParams[ PROJECT_INT_PARAMETERS_COUNT ];

cudaMemcpyToSymbol( ProjectIntParams, &constHostIntParams[0], PROJECT_INT_PARAMETERS_COUNT * sizeof( int ), 0 );

eyal

thanks eyal…

I did try to use the constant memory…

but I had some issues freeing the constant memory after setting it once…required a restart…

so was looking for some other alternatives…

eyalhir74 · June 8, 2009, 9:00am

There is no need to free the constant memory. What did you try to do?

BTW - as a general advice - its not that logic that nVidia would put a piece of hardware (constant memory)

that doesnt work or requires a restart after using it - that usually means one doesnt use it properly.

Constant memory is exactly what you want - check the SDK and programming guide for more information.

cheers

eyal

NCC-1701D · June 8, 2009, 9:15am

yeah there might have been a mistake in my using it…

actually i had tried to use more memory than allocated and I guess this is why i needed to to restart it…

i’ll fix it up…

thanks again :)

Cygnus_X1 · June 8, 2009, 9:26am

constant memory has some limitations when accessing it. If all threads access the same field, everything is OK, but if all threads access different fields (e.g. param[threadIdx.x]) the code is serialized - at least that’s what the manuals say.

If you really want to pass an array, pack it into a structure:

struct MyStruct {

  int params[8];

}

NCC-1701D · June 9, 2009, 3:49am

[quote name=‘PDan’ post=‘550120’ date=‘Jun 8 2009, 11:26 AM’]

constant memory has some limitations when accessing it. If all threads access the same field, everything is OK, but if all threads access different fields (e.g. param[threadIdx.x]) the code is serialized - at least that’s what the manuals say.

If you really want to pass an array, pack it into a structure:

[codebox]Host code

…

my_kernel<<<grid, threads>>> (d_in, d_out, n)

…

Device code

global void my_kernel (int* in, int* out, int n)

{

…

shared int dim[3]

for (i = 0; i <n; i++) {

dev_func(in, out, dim);

dim[0] +=1;

dim[1] +=1;   

dim[2] +=1;

}

…

}

device dev_func (int arr[3])

{

int x = threadIdx.x;

out = arr [0] * in + arr[1]*in*2 +arr[2];

}[/codebox]

In the above using global memory tends to hit the performance of the entire kernel pretty bad…so was intending on using shared memory…is there any way to do this…i went through the programming guide, but didn’t find much…

Cygnus_X1 · June 9, 2009, 8:53am

Just pass it. With little luck, compiler will know that now you are using a shared memory pointer instead of global pointer inside your device function.

I use this trick many times, calling the same device function with different pointers. All device functions are inlined - that’s how it is resolved.

NCC-1701D · June 10, 2009, 4:53am

Hmm…interesting…but I am not very convinced abt reason as to why it works - yeah the devices functions are all inlined, but how does that resolve this pointer and make the compiler realize that’s its pointing to the shared memory ?? … but since this done by each block within the kernel it logically it should be possible…

but as long as it works its good…but Pdan you use cuda2.2 to and this works for you ??

because in my case it doesn’t seem to work !! - I use a Quadro 5600 Fx, with CUDA 2.1…

Further, it works if only one block is being launched…that is somehow accountable since, Shared memory is local to each SM and there is some problem for shared memory spanning across the multiple SMs…

but, that’s also something weird because in my application I do not need the shared memory array to span across the SMs…

Topic		Replies	Views
Global arrays? CUDA Programming and Performance	24	10625	August 18, 2010
Passing an array to the kernel in constant memory CUDA Programming and Performance	3	4458	March 19, 2011
Allocating a multidimensional array onto a device variable CUDA Programming and Performance	6	1596	July 15, 2015
Kernel requiring large number of parameters CUDA Programming and Performance	14	8643	September 5, 2008
How to implement a generic array in unified memory? CUDA Programming and Performance	7	2027	July 2, 2017
efficient static arrays in kernel CUDA Programming and Performance	2	2297	March 31, 2009
Small const array accessable globally? Is it easy and possible? CUDA Programming and Performance	6	1410	April 16, 2009
Passing a pointer to constant memory CUDA Programming and Performance	1	1050	January 6, 2009
Passing a multidimensional array to kernel how to allocate space in host and pass to device? CUDA Programming and Performance	12	16206	November 22, 2014
stupid memory question CUDA Programming and Performance	11	2066	July 28, 2009

passing an array to a kenel ?

Related topics