cuda-gdb not reaching breakpoints

Hi,
How do you insure that cuda-gdb will reach the breakpoints inside the kernel?
I did the bitreverse example and it is OK.
With my code, it would only jump over the kernel. Why? I know the kernel is being called.