cannot step into kernel in sample

Hello, I’m trying to run & debug “vectorAdd” in the samples, but both eclipse nsight and cuda-gdb refuse to step inside the kernel.
cedric@IslaNegra:~/NVIDIA_CUDA-5.0_Samples/0_Simple/vectorAdd$ nvcc -g -G -keep -o vectorAdd
–> works fine
cedric@IslaNegra:~/NVIDIA_CUDA-5.0_Samples/0_Simple/vectorAdd$ cuda-gdb ./vectorAdd
NVIDIA ® CUDA Debugger
5.0 release
GNU gdb (GDB) 7.2
Reading symbols from /home/cedric/NVIDIA_CUDA-5.0_Samples/0_Simple/vectorAdd/vectorAdd…done.
(cuda-gdb) break main
Breakpoint 1 at 0x400c4d: file, line 49.
(cuda-gdb) break vectorAdd
Breakpoint 2 at 0x401316: file, line 33.
(cuda-gdb) run
Starting program: /home/cedric/NVIDIA_CUDA-5.0_Samples/0_Simple/vectorAdd/vectorAdd
[Thread debugging using libthread_db enabled]

Breakpoint 1, main () at
49 cudaError_t err = cudaSuccess;
(cuda-gdb) continue
[Vector addition of 50000 elements]
[New Thread 0x7ffff5a85700 (LWP 4031)]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads

Breakpoint 2, vectorAdd (__cuda_0=0x400140000, __cuda_1=0x400170e00,
__cuda_2=0x4001a1c00, __cuda_3=50000) at
33 {
(cuda-gdb) step
__device_stub__Z9vectorAddPKfS0_Pfi (__par0=0x400140000, __par1=0x400170e00,
__par2=0x4001a1c00, __par3=50000) at vectorAdd.cudafe1.stub.c:7
7 void __device_stub__Z9vectorAddPKfS0_Pfi(const float *__par0, const float *__par1, float *__par2, int __par3){__cudaSetupArgSimple(__par0, 0UL);__cudaSetupArgSimple(__par1, 8UL);__cudaSetupArgSimple(__par2, 16UL);__cudaSetupArgSimple(__par3, 24UL);__cudaLaunch(((char *)((void ( *)(const float *, const float *, float , int))vectorAdd)));}
(cuda-gdb) step
cudaLaunch<char> (
func=0x4012fe "UH\211\345SH\203\354(H\211}\350H\211u\340H\211U?M?M\324H\213U\330H\213]\340H\213E\350H\211\336H\211\307\350\024\377\377\377H\203\304([\311\303UH\211\345SH\203\354\070H\211}\350H\213E\350H\211\005\261- ")
at cuda_runtime.h:1072
1072 return cudaLaunch((const void
(cuda-gdb) step
vectorAdd (__cuda_0=0x400140000, __cuda_1=0x400170e00, __cuda_2=0x4001a1c00,
__cuda_3=50000) at
40 }
(stepped out of kernel)
(cuda-gdb) step
main () at
133 err = cudaGetLastError();
(cuda-gdb) step
135 if (err != cudaSuccess)
(cuda-gdb) print err
$1 = cudaSuccess
(so it executed correctly, but would not step inside)

what am I doing wrong ?

Can you run the application outside the debugger? Please note you cannot debug on GPU used to draw OS gui (e.g. you would need 2 GPUs to debug from Gnome/KDE/etc.)