Circumventing Argument Number Restrictions

Hello, In CUDA there is a maximum number of arguments that you can pass to a kernel.

In the past, we have been doing something quite evil to get around this. Since our kernels are dynamically generated, we allocate memory first and then hard code the buffer locations into the source. This is ugly but I believe technically not against CUDA spec.

Now, we are trying to port this monster to OpenCL. The OpenCL spec does not allow you to get access to “device” pointers, so it would seem that this technique is in violation of the specification. However, it is possible to create a kernel that will “extract” a device pointer, convert it to an unsigned integer, and then put this integer somewhere where the host can later read.

My co-programmer suggests that this is a fine approach, since under the hood, OpenCL is the same as CUDA, so… ripping out pointers … good idea ? My opinion is that this is against the OpenCL spec, so it won’t work on all implementations, so … it should not enter the code-base.

If you’d be willing to go through the trouble of launching a kernel to figure out device pointers and then go through a compilation cycle to hard-code them into your kernel, wouldn’t it just be more efficient to simply copy all your arguments in a struct on the device and launch your kernel with a pointer to that struct? Just load frequently used arguments from that struct into kernel registers.

It wouldn’t be too different from how x86 handles functions with a gazillion arguments, pushing them on the stack and passing the first argument’s pointer to the actual function in one of its limited registers.

Could you post a code example of how this is done? Both CPU and GPU sides… :rolleyes:

Try putting the arguments in constant memory?

Hey, thanks everyone. Structs and constant memory have been considered, but we are using PyOpenCL and haven’t figured out how to access that part of the API through the wrappers yet. I guess I’ll head over to their forum and ask how one might create a struct with buffers in it.