Problem on psaaing memory from host to device

elguepardo · April 19, 2012, 2:47am

Hi all,

I am new on cuda coding and currently encounter some problems on passing memory.

My problem is that I need to pass a large amount of variables into kernel for execution, but the total memory size exceeds the upper limit that a kernel can transfer at one time. My code is like this:

global void kernel( float *a0, float *a1, …, *float a18, float *b0, float *b1, …, *float b18, float *k1, float *k2, float *k3, float *k4, float k5)
{
kernel execution…
}

there are 42 float pointers and 1 float needed to be passed to the kernel, but error will occur while compiling. If there is any other method to solve my problem, can I use a struct to solve this?

Please help this issue, and thanks for all of your help.

Gilles_C · April 19, 2012, 7:54am

Hi,

Yes you certainly can address this issue by packing all or a part of your arguments into a structure allocated on the device, and only pass a pointer to this structure.

In essence, you would do it like this:

struct kernel_args {

    float *a0, *a1, ...,*a18;

    float *b0    ...;

};

...

kernel_args h_args, *d_args;

cudaMalloc(&d_args, sizeof(kernel_args));

cudaMalloc(&(h_args.a0), ...);

...

//transfer the data you need on the device memory

cudaMemcpy(h_args.a0, ..., cudaMemcpyHostToDevice);

...

cudaMemcpy(d_args, &h_args, sizeof(kernel_args), cudaMemcpyHostToDevice);

kernel<<<grid,threads>>>(d_args);

This works but you might also want to reconsider whether you actually need to transfer that many pointers in the first place.

DrAnderson42 · April 19, 2012, 1:12pm

Are you targeting Fermi (and newer) devices? Compile with -arch sm_20 and you should be able to pass that many arguments. The limit is 4k for sm_20 and 256 for sm_1x (see D.2.4.1 in the programming guide).

If you need to support older hardware, I would recommend creating a struct of pointers and storing it in constant memory.

elguepardo · April 23, 2012, 3:38am

thanks for all your help, those methods really work. I will also reconsider is there any other way to reduce the variables I need.