I write a cuda application and use cuda-gdb to debug. It seems that setting a break point in the cuda kernel is very slow. For example, I set a break point in main.cu:101 and press enter button. It takes 1min for the cuda-gdb to return.
The s or step command is typically used in the following way:
(cuda-gdb) help s
step, s
Step program until it reaches a different source line.
Usage: step [N]
Argument N means step N times (or till program stops for another reason).