I’m in the process of porting a CUDA-based application from Windows XP to Mac OS X. I’ve managed to fix all compile problems, but I’ve stumbled upon a strange issue:
I have some CUDA kernels that take a lot of parameters, for example the first one to be called takes 10*4 bytes (one float, one dim3, 6 pointers). The kernel works fine on Windows.
However, on Mac OS X, the last parameters is always set to 0x1 in emulation mode, which causes a crash, and the kernel crashes in non-emulation mode, too (but I can’t verify the value there obviously). Switching around parameters doesn’t help. When I remove one of them, the others are all ok.
Moving the parameters into a struct doesn’t change anything.
Is this a bug? Does anybody have some workaround for it?
Can you post your machine’s specs and some code? I just wrote a trivial kernel that uses 52 bytes of params (12 unsigned ints and a pointer) that runs fine on my Macbook Pro (8600M GT) in both hardware and device emulation.
You might want to double check that you’re allocating the right amount of shared memory in your kernel launch–I had a similar problem to what you’re experiencing, and that was caused by allocating too little shared memory.
edit: just added two additional tests with a dim3 argument, and everything has worked fine.