I am doing a Molecular Dynamics simulation with CUDA. For the task, I am required to pass parameters to the kernel which exceed 256 bytes, the size allowed by CUDA. Can anyone please help me out as to what I can do to overcome this problem.
a.k.a. store your larger parameters in constant, global, or texture memory.
I’ve run into this limitation on almost all of my kernels to date… I tend to store matrices and the likes in constant memory now, instead of smem (formal parameter list).
Yes, even I was thinking on the same lines. But could not figure out how to implement it. I have a structure on the host. Now, how can I define a structure on the device and pass it. I mean should I copy all the elements of the device structure using cudaMemcpy or copy the structure as a whole.
Could you please elaborate a bit. This would solve my problem.
It would probably be better to have this struct in constant memory (copy it to device using a copy to symbol function).
By the way, would passing a host struct as a parameter work? That is if I had
myKernel<<< … >>>( hostStruct, … );
would the hostStruct be copied to the smem buffer for parameters?
I do not understand why do we need two structures.Even in the hostStruct, we are using cudaMalloc to allocate memory for pData where as we are not allocating initalizing/allocating pData for deviceStruct.
This might be a very naive question but it would help me understand it much better.
I understand it now. What I was thinking is that we could just pass the address of the structure hostStruct but I now I understood that it would be in the CPU address space. Thanks a lot. I really appreciate it.
// Allocate on the CPU RAM.
GGPUGenericSearchParams *pHostGenericSearchParams = new GGPUGenericSearchParams ();
// Allocate the structure's pointers on the device !!
CUDA_SAFE_CALL( cudaMalloc( ( void ** )&( pHostGenericSearchParams ->m_p1 ), iSize ) );
CUDA_SAFE_CALL( cudaMalloc( ( void ** )&( pHostGenericSearchParams ->m_p2 ), iSize ) );
// Allocate the structure on the device memory and copy the host's contents into the device structure.
// since those are valid device pointers the copy should be valid.
GGPUGenericSearchParams *pDeviceGenericParams;
CUDA_SAFE_CALL( cudaMalloc( ( void ** )&( pDeviceGenericParams ), sizeof( GGPUGenericSearchParams ) ) );
CUDA_SAFE_CALL( cudaMemcpy( pDeviceGenericParams, pHostGenericSearchParams, sizeof( GGPUGenericSearchParams ), cudaMemcpyHostToDevice ) );
// call the kernel with the pDeviceGenericParams pointer....
myKernel<<< .... >>>( pDeviceGenericParams );
// in the kernel you can use it like this: pDeviceGenericParams->m_p1 ....