Parameter Count Limitation?


I’m in the process of porting a CUDA-based application from Windows XP to Mac OS X. I’ve managed to fix all compile problems, but I’ve stumbled upon a strange issue:

I have some CUDA kernels that take a lot of parameters, for example the first one to be called takes 10*4 bytes (one float, one dim3, 6 pointers). The kernel works fine on Windows.

However, on Mac OS X, the last parameters is always set to 0x1 in emulation mode, which causes a crash, and the kernel crashes in non-emulation mode, too (but I can’t verify the value there obviously). Switching around parameters doesn’t help. When I remove one of them, the others are all ok.

Moving the parameters into a struct doesn’t change anything.

Is this a bug? Does anybody have some workaround for it?

There is indeed a parameter limit, but you should get an error when compiling your kernel if you have reached it.

I don’t get any error or warning…

Why is the limit different to the Windows implementation?

Can you post your machine’s specs and some code? I just wrote a trivial kernel that uses 52 bytes of params (12 unsigned ints and a pointer) that runs fine on my Macbook Pro (8600M GT) in both hardware and device emulation.

You might want to double check that you’re allocating the right amount of shared memory in your kernel launch–I had a similar problem to what you’re experiencing, and that was caused by allocating too little shared memory.

edit: just added two additional tests with a dim3 argument, and everything has worked fine.

I’m testing on my MacPro 1st Gen with the aftermarket 8800GT that was released recently.

I’ll try to create some smaller example. Extracting the specific code is complicated, since it’s integrated in a rather large codebase.