Parameter Passing to Device

MathewPotter · June 10, 2008, 9:44am

This is my first post. I have a task cut out.

I need to port a function to CUDA. Function reads parametes from file and computes certain other parameters. There are about 13 parameters.

Initially when I wrote the function in a single kernel, all these parameters were hardcoded. I achieved performance and my output was also correct.

Now when I pass these 13 parameters from host to device (kernel function), my performance degrades substanially and there was no resultant output.
If I execute in EMUDebug mode, I am getting the desired output.

What else can be problem. Please advice, this is my first post.

Sarnath · June 10, 2008, 11:26am

Therez some limitation on number of args that you can pass to the CUDA kernel.

Structurize them , copy out to a GPU mem location and pass that GPU location as a pointer to your kernel.

MathewPotter · June 10, 2008, 11:48am

Infact I structurized them and copied to GPU mem location.

It was a catastrope. It is taking hell lot of a time, more than what it takes for the CPU equivalent.

Any other suggestions.

MathewPotter · June 10, 2008, 3:23pm

Can anyone provide me a solution.
A solution is badly needed.

Thanks in advance

Mathew Potter

kristleifur · June 10, 2008, 3:31pm

Hi,
Can you post the code where

the kernel is called with multiple arguments - the device kernel function definition (just the top line) and the host variable initialisation, and …
the kernel is called in the struct manner - again, the function definition and host variable init / malloc / memcpy into device struct?

Sarnath · June 11, 2008, 6:47am

In your kernel, are you copying the whole structure from GPU mem to shared mem cache? OR are you using a local array?

It would be dead slow if you had used a local array instead of shared mem array.

It would be slow if all threads in the block copy the structure from global mem to shared mem – something like if you had used “sharedMemStructure = *gMem”. Then all threads will load it from gmem which could make it slow and redundant if you are using more threads per block and more blocks in your kernel.

HTH

MathewPotter · June 11, 2008, 9:55am

OK I could fix it. It was internal error due to improper indexing.
Now I am passing arguments through an array.

Thanks a lot for all suggestions.

Topic		Replies	Views
Kernel requiring large number of parameters CUDA Programming and Performance	14	8681	September 5, 2008
Do more parameters passed to kernel make it slower? CUDA Programming and Performance	9	6414	December 17, 2009
can I transfer an short array of pointer to __global__ function? CUDA Programming and Performance	2	778	June 1, 2016
Parameters passed to a CUDA kernel exceed 256 bytes. CUDA Programming and Performance	13	6981	September 21, 2009
Some Questions regarding CUDA CUDA Programming and Performance	1	835	September 29, 2009
Vars in global memory vs. Parameters in kernels CUDA Programming and Performance	3	6629	February 25, 2008
sending parameters to kernel CUDA Programming and Performance	1	2677	June 12, 2011
memory concept CUDA Programming and Performance	3	443	September 28, 2016
How to pass large arguments in CUDA kernels Kernel arguments CUDA Programming and Performance	10	19053	December 18, 2009
Passing variables into kernel over 256 bytes CUDA Programming and Performance	5	9606	July 12, 2011

Parameter Passing to Device

Related topics