Parameters passed to a CUDA kernel exceed 256 bytes.

mayank · September 15, 2009, 9:34pm

Hey,

I am doing a Molecular Dynamics simulation with CUDA. For the task, I am required to pass parameters to the kernel which exceed 256 bytes, the size allowed by CUDA. Can anyone please help me out as to what I can do to overcome this problem.

Thanks

tmurray · September 15, 2009, 9:49pm

pack your arguments in a struct and pass a pointer to the struct

Smokey · September 15, 2009, 11:43pm

a.k.a. store your larger parameters in constant, global, or texture memory.

I’ve run into this limitation on almost all of my kernels to date… I tend to store matrices and the likes in constant memory now, instead of smem (formal parameter list).

mayank · September 16, 2009, 3:19am

Yes, even I was thinking on the same lines. But could not figure out how to implement it. I have a structure on the host. Now, how can I define a structure on the device and pass it. I mean should I copy all the elements of the device structure using cudaMemcpy or copy the structure as a whole.

Could you please elaborate a bit. This would solve my problem.

Thanks.

mayank · September 17, 2009, 5:05am

Help Guys. Please.

eyalhir74 · September 17, 2009, 12:06pm

The following should be ok:

typedef struct MyStruct

{

  float *pData;

  //... any number of arrays that you'd like....

};

MyStruct hostStruct;

MyStruct *deviceStruct;

int iSize = 100 * sizeof( float );

float *pInputData = new float[ 100 ];  // and fill it with data...

cudaMalloc( ( void ** )&( hostStruct.pData ), iSize);

cudaMemcpy( hostStruct.pData, pInputData, iSize, cudaMemcpyHostToDevice );

// Now copy the host structure into the device structure...

cudaMalloc( ( void ** )&( deviceStruct ), sizeof( MyStruct ) );

cudaMemcpy( deviceStruct, hostStruct, sizeof( MyStruct ), cudaMemcpyHostToDevice );

myKernel<<< ... >>>( deviceStruct, ... );

...

In your kernel you use it like:

deviceStruct->pData

eyal

_Big_Mac · September 17, 2009, 1:13pm

It would probably be better to have this struct in constant memory (copy it to device using a copy to symbol function).

By the way, would passing a host struct as a parameter work? That is if I had
myKernel<<< … >>>( hostStruct, … );
would the hostStruct be copied to the smem buffer for parameters?

eyalhir74 · September 17, 2009, 1:19pm

I think I once tried it and the kernel crashed…

eyal

mayank · September 17, 2009, 8:01pm

The following should be ok:

typedef struct MyStruct

{

  float *pData;

  //... any number of arrays that you'd like....

};

MyStruct hostStruct;

MyStruct *deviceStruct;

int iSize = 100 * sizeof( float );

float *pInputData = new float[ 100 ];  // and fill it with data...

cudaMalloc( ( void ** )&( hostStruct.pData ), iSize);

cudaMemcpy( hostStruct.pData, pInputData, iSize, cudaMemcpyHostToDevice );

// Now copy the host structure into the device structure...

cudaMalloc( ( void ** )&( deviceStruct ), sizeof( MyStruct ) );

cudaMemcpy( deviceStruct, hostStruct, sizeof( MyStruct ), cudaMemcpyHostToDevice );

myKernel<<< ... >>>( deviceStruct, ... );

...

In your kernel you use it like:

deviceStruct->pData

eyal

cudaMalloc( ( void ** )&( hostStruct.pData ), iSize);

cudaMemcpy( hostStruct.pData, pInputData, iSize, cudaMemcpyHostToDevice );

Should this be not, cudaMalloc((void **)&(deviceStruct->pData), iSize);

and, cudaMemcpy(deviceStruct->pData, pInputData,iSize,cudaMemcpyHostToDevice);

I do not understand why do we need two structures.Even in the hostStruct, we are using cudaMalloc to allocate memory for pData where as we are not allocating initalizing/allocating pData for deviceStruct.

This might be a very naive question but it would help me understand it much better.

Thanks.

mayank · September 17, 2009, 9:28pm

I understand it now. What I was thinking is that we could just pass the address of the structure hostStruct but I now I understood that it would be in the CPU address space. Thanks a lot. I really appreciate it.

mayank · September 20, 2009, 6:04pm

// Now copy the host structure into the device structure…

cudaMalloc( ( void ** )&( deviceStruct ), sizeof( MyStruct ) );

cudaMemcpy( deviceStruct, hostStruct, sizeof( MyStruct ), cudaMemcpyHostToDevice );

myKernel<<< … >>>( deviceStruct, … );

…[/code]

In your kernel you use it like:

deviceStruct->pData

eyal

[/quote]

cudaMemcpy( deviceStruct, hostStruct, sizeof( MyStruct ), cudaMemcpyHostToDevice );

Should this be: cudaMemcpy( deviceStruct, &hostStruct, sizeof( MyStruct ), cudaMemcpyHostToDevice ); or something else?

mayank · September 20, 2009, 7:08pm

When I am passing a pointer to the structure, I am getting an error with the kernel launch.
How can I solve it?
Please help.

eyalhir74 · September 21, 2009, 6:23am

Try the following (works fine for my kernels):

// Allocate on the CPU RAM.

GGPUGenericSearchParams  *pHostGenericSearchParams = new GGPUGenericSearchParams ();

// Allocate the structure's pointers on the device !!

CUDA_SAFE_CALL( cudaMalloc( ( void ** )&( pHostGenericSearchParams ->m_p1 ), iSize ) );

CUDA_SAFE_CALL( cudaMalloc( ( void ** )&( pHostGenericSearchParams ->m_p2 ), iSize ) );

// Allocate the structure on the device memory and copy the host's contents into the device structure.

// since those are valid device pointers the copy should be valid.

GGPUGenericSearchParams *pDeviceGenericParams;

CUDA_SAFE_CALL( cudaMalloc( ( void ** )&( pDeviceGenericParams ), sizeof( GGPUGenericSearchParams ) ) );

CUDA_SAFE_CALL( cudaMemcpy( pDeviceGenericParams, pHostGenericSearchParams, sizeof( GGPUGenericSearchParams ), cudaMemcpyHostToDevice ) );

// call the kernel with the pDeviceGenericParams pointer....

myKernel<<< .... >>>( pDeviceGenericParams );

// in the kernel you can use it like this: pDeviceGenericParams->m_p1 ....

Hope it helps

eyal

mayank · September 21, 2009, 3:48pm

Try the following (works fine for my kernels):

// Allocate on the CPU RAM.

GGPUGenericSearchParams  *pHostGenericSearchParams = new GGPUGenericSearchParams ();

// Allocate the structure's pointers on the device !!

CUDA_SAFE_CALL( cudaMalloc( ( void ** )&( pHostGenericSearchParams ->m_p1 ), iSize ) );

CUDA_SAFE_CALL( cudaMalloc( ( void ** )&( pHostGenericSearchParams ->m_p2 ), iSize ) );

// Allocate the structure on the device memory and copy the host's contents into the device structure.

// since those are valid device pointers the copy should be valid.

GGPUGenericSearchParams *pDeviceGenericParams;

CUDA_SAFE_CALL( cudaMalloc( ( void ** )&( pDeviceGenericParams ), sizeof( GGPUGenericSearchParams ) ) );

CUDA_SAFE_CALL( cudaMemcpy( pDeviceGenericParams, pHostGenericSearchParams, sizeof( GGPUGenericSearchParams ), cudaMemcpyHostToDevice ) );

// call the kernel with the pDeviceGenericParams pointer....

myKernel<<< .... >>>( pDeviceGenericParams );

// in the kernel you can use it like this: pDeviceGenericParams->m_p1 ....

Hope it helps

eyal

And do I declare, GGPUGenericSearchParams in the same way as myStruct, declared previously. I declare m_p1, m_p2 there??

Topic		Replies	Views
How to pass large arguments in CUDA kernels Kernel arguments CUDA Programming and Performance	10	18983	December 18, 2009
Passing a structure with a pointer How do you pass a structure with a pointer in it to a kernel CUDA Programming and Performance	8	1236	March 22, 2011
Passing structures into CUDA kernels CUDA Programming and Performance	9	20287	November 19, 2020
Passing an array of structure to kernel CUDA Programming and Performance kernel	6	2219	April 27, 2020
Arrays of Structure Allocating memory for array of structures. CUDA Programming and Performance	7	3592	September 24, 2009
Another Device Memory Question CUDA Programming and Performance	7	2305	February 9, 2010
How do you copy entire struct's with pointers? Problem with copying structures containing pointe CUDA Programming and Performance	6	9783	July 13, 2011
Pointer as formal parameter in kernel call CUDA Programming and Performance	8	6374	March 13, 2009
How do I pass a double pointers array to the device? I'm getting cudaErrorIllegalAddress CUDA Programming and Performance	12	3475	January 17, 2024
How to copy a structure of arrays on GPU? CUDA Programming and Performance	7	24657	October 6, 2011

Parameters passed to a CUDA kernel exceed 256 bytes.

Related topics