Hi all-
I believe (although i cannot confirm) somewhere, someone mentioned that the arguments to a kernel are limited to 256 bytes. Can someone confirm and/or explain this? Does this mean that the total size of the data in my argument list cannot exceed 256 bytes? If so, this seems a little (ok, a lot) restrictive for an API that is focused on massively data-parallel applications.
Oh, another question: Does anyone have any strategies for ensuring/confirming that a kernel is launched within device constraints, since there’s no segfault or fpe-trap support? Thanks for any and all comments!
Why not just pass a pointer to a structure that contains all of your data…?
ahhhhhhh…a single pointer that points to only the first element of the entire data array? So even if my data array is > 256 bytes, the passed pointer is nowhere near that limit?
sure, it’s just that the actual arguments passed to a kernel (the size of every piece of data within your kernel<<<x,y>>>(data, data2…) ) must be less than 256 bytes. It’s not really a big deal at all.
Correct. You allocate your arrays on the device with cudaMalloc (up to the free memory on your card), load your data into those memory blocks with cudaMemcpy, and pass just the pointers (4 or 8 bytes depending on whether your OS is 32 or 64 bit) as parameters to your kernel.
Nice. Thank you so much tmurray and seibert for clarifying this for me. I was afraid I wouldn’t be able to kernel-ize some code of mine because my data is so godawful huge.
So, just to beat a dead horse: if i passed 65 float* pointers to a kernel, that would most definitely crap out, right?
on a 32-bit machine, yes, and on a 64-bit machine, 33 would crap out.