Ehh, not the number of variables has a limit, but the number of registers used determines the maximum blocksize you can run.
Do you check the number of registers used by your kernels?
Do you check for errors? Because loosing values of some variables sounds like your kernel actually did not run at all because of a “too many resources requested for launch” error.
If you are using Visual Studio with Cuda Build Rules you can set “PtxAsOptionV” to Yes. This will cause it to print in the output screen how much of resources your kernel is using.
If you use nvcc command line, you can add --ptxas-options=-v for the same effect.
i have a short question is it guaranteed that the compiler will put small variables like float[3] arrays onto register memory or is it a chance on having them as local ?
It depends on how you access it. If you always access elements via float[0], float[1], and float[2], then yes you are basically guaranteed that it will be put in registers.
If you access float[i], where is is a variable, you are pretty much guaranteed that it will be put into local memory.