No matter how many warps there are, only one warp hits the breakpoint, right?

I followed the documentation, walk-through examples bitreverse;
After hitting the breakpoint, I switch to warp3;
Then type the continue command, and when it stops again, it automatically switches to warp0;
1679986744823

Is thisinconsistent with the description of the document?

Hi @204540478,
Could you please share the full log on the debugging session?

It looks like the following is happening:

  • One of the application warps hits the breakpoint 3 (on GPU)
    • Your sceenshot doesn’t show which thread
  • You switch the focus to the warp 3 - this doesn’t affect the GPU-side execution, so multiple remaining warps can hit the same breakpoint
  • You type continue so the app resumes execution (with some warps already past the breakpoint and some warps before breakpoint)
  • Other warp hits the same breakpoint

Having the full log might provide more info.

Sorry, confused;

  1. Why do I only see the same thread hit the breakpoint every time; other warp/threads that hit the breakpoint are not displayed?

  2. Even if I switch to warp 3; it doesn’t mean that the warp will hit the breakpoint, right? Switching focus is useless, only displayed the warp which hits the breakpoint first and ignore the warp/thread that hits the breakpoint later;

  3. When warp0 hits the breakpoint, some warps are before the breakpoint and some are after the breakpoint,
    Is my understanding correct?

item 2 or 3 ,which is is correct?

1680070357246

Hi!
Can you please collect the logs are follows:

  • Start new cuda-gdb command
  • Execute the following commands first
set trace-commands on
set logging enabled on
  • Run the debugging session
  • When you quit the debugger, you should have gdb.txt file generated - can you share this file?

example:

gdb.txt:

gdb.txt (16.1 KB)

Thanks for your reply;

  1. My question is will each warp stop when it hits a breakpoint? Like the thread of the cpu; or switch to a certain warp, only this warp can hit the breakpoint;
  2. When a warp hits a breakpoint and stops, do all warps hit the breakpoint?

Thank you for the logs, now I can provide must better explanation. First of all, please note that execution is happening on the warp level (so all threads of the warp execute the same instruction/hit the breakpoint).

When running the bitreverse sample warps don’t diverge (so all warps are executing the same instructions), **which is not always the case **.

  • When warp 0 hits the first breakpoint, all warps hit it at the same time (note the PC addresses of the active warps). In such case debugger only reports single BP hit (for an arbitrary warp).
Thread 1 "bitreverse" hit Breakpoint 1, bitreverse<<<(1,1,1),(256,1,1)>>> (data=0x7fff3da00000) at bitreverse.cu:14
14              array[threadIdx.x] = ((0xf0f0f0f0 & array[threadIdx.x]) >> 4) |
+info cuda warps
  Wp Active Lanes Mask Divergent Lanes Mask Active Physical PC Kernel BlockIdx First Active ThreadIdx
Device 0 SM 0
*  0        0xffffffff           0x00000000 0x00000000000002e0      0  (0,0,0)                (0,0,0)
   1        0xffffffff           0x00000000 0x00000000000002e0      0  (0,0,0)               (32,0,0)
   2        0xffffffff           0x00000000 0x00000000000002e0      0  (0,0,0)               (64,0,0)
   3        0xffffffff           0x00000000 0x00000000000002e0      0  (0,0,0)               (96,0,0)
   4        0xffffffff           0x00000000 0x00000000000002e0      0  (0,0,0)              (128,0,0)
   5        0xffffffff           0x00000000 0x00000000000002e0      0  (0,0,0)              (160,0,0)
   6        0xffffffff           0x00000000 0x00000000000002e0      0  (0,0,0)              (192,0,0)
   7        0xffffffff           0x00000000 0x00000000000002e0      0  (0,0,0)              (224,0,0)
  • Same for breakpoints 2 and 3
Thread 1 "bitreverse" hit Breakpoint 2, bitreverse<<<(1,1,1),(256,1,1)>>> (data=0x7fff3da00000) at bitreverse.cu:16
16              array[threadIdx.x] = ((0xcccccccc & array[threadIdx.x]) >> 2) |
+info break
Num     Type           Disp Enb Address            What
1       breakpoint     keep y   0x0000555555d44ce0 in bitreverse(void*) at bitreverse.cu:14
        breakpoint already hit 1 time
2       breakpoint     keep y   0x0000555555d44fd0 in bitreverse(void*) at bitreverse.cu:16
        breakpoint already hit 1 time
3       breakpoint     keep y   0x0000555555d452c0 in bitreverse(void*) at bitreverse.cu:18
+info cuda warps
  Wp Active Lanes Mask Divergent Lanes Mask Active Physical PC Kernel BlockIdx First Active ThreadIdx
Device 0 SM 0
*  0        0xffffffff           0x00000000 0x00000000000005d0      0  (0,0,0)                (0,0,0)
   1        0xffffffff           0x00000000 0x00000000000005d0      0  (0,0,0)               (32,0,0)
   2        0xffffffff           0x00000000 0x00000000000005d0      0  (0,0,0)               (64,0,0)
   3        0xffffffff           0x00000000 0x00000000000005d0      0  (0,0,0)               (96,0,0)
   4        0xffffffff           0x00000000 0x00000000000005d0      0  (0,0,0)              (128,0,0)
   5        0xffffffff           0x00000000 0x00000000000005d0      0  (0,0,0)              (160,0,0)
   6        0xffffffff           0x00000000 0x00000000000005d0      0  (0,0,0)              (192,0,0)
   7        0xffffffff           0x00000000 0x00000000000005d0      0  (0,0,0)              (224,0,0)
Thread 1 "bitreverse" hit Breakpoint 3, bitreverse<<<(1,1,1),(256,1,1)>>> (data=0x7fff3da00000) at bitreverse.cu:18
18              array[threadIdx.x] = ((0xaaaaaaaa & array[threadIdx.x]) >> 1) |
+info cuda warps
  Wp Active Lanes Mask Divergent Lanes Mask Active Physical PC Kernel BlockIdx First Active ThreadIdx
Device 0 SM 0
*  0        0xffffffff           0x00000000 0x00000000000008c0      0  (0,0,0)                (0,0,0)
   1        0xffffffff           0x00000000 0x00000000000008c0      0  (0,0,0)               (32,0,0)
   2        0xffffffff           0x00000000 0x00000000000008c0      0  (0,0,0)               (64,0,0)
   3        0xffffffff           0x00000000 0x00000000000008c0      0  (0,0,0)               (96,0,0)
   4        0xffffffff           0x00000000 0x00000000000008c0      0  (0,0,0)              (128,0,0)
   5        0xffffffff           0x00000000 0x00000000000008c0      0  (0,0,0)              (160,0,0)
   6        0xffffffff           0x00000000 0x00000000000008c0      0  (0,0,0)              (192,0,0)
   7        0xffffffff           0x00000000 0x00000000000008c0      0  (0,0,0)              (224,0,0)
  1. My question is will each warp stop when it hits a breakpoint? Like the thread of the cpu; or switch to a certain warp, only this warp can hit the breakpoint;

When BP is hit by a warp (or multiple warps if multiple warps arrive to it at the same time) all warps on GPU are stopped (docs link: CUDA-GDB)

When a breakpoint is set, it forces all resident GPU threads to stop at this location when it reaches the corresponding PC.

Focus doesn’t affect GPU side execution/breakpoints. It’s only used to specify current warp for debugger operations.

  1. When a warp hits a breakpoint and stops, do all warps hit the breakpoint?

All warps, which are executing the same instructions as the warp, which hit the breakpoint. E.g. if due to some reason there is a divergence and warps 0-3 diverge from warps 4-7 the following might happen (one possible scenario):

  • Warps 0-3 hit the breakpoint 1 - debugger reports breakpoint for (e.g.) warp 0
  • Warps 4-7 hit the breakpoint 1 - debugger reports breakpoint for (e.g.) warp 4
  • Warps 0-3 hit the breakpoint 2 - debugger reports breakpoint for (e.g.) warp 0
  • Warps 4-7 hit the breakpoint 2 - debugger reports breakpoint for (e.g.) warp 4
  • Warps 0-3 hit the breakpoint 3 - debugger reports breakpoint for (e.g.) warp 0
  • Warps 4-7 hit the breakpoint 3 - debugger reports breakpoint for (e.g.) warp 4

@AKravets Thank you so much for your long reply;

That is to say, in any scenario, if there is no divergence , if the GDB reports that a certain warp hits a breakpoint,does it mean that all warps/threads have hit this BP;
Unlike CPU threads, each thread (GPU warps) hits the same breakpoint once;

I type the “where” command and see the result is that all warps run to the same line of code;

Yes, if there is no divergence (so all warps are executing the same instruction), all warps would hit the BP at the same time, but the debugger would only report it once.

It can be confirmed by checking the warp status (e.g. cuda info warps) to check that all warps have the same PC (program counter).

If you want to observe different behavior you can also introduce divergence manually by advancing a warp via step command (note that when you issue step command it only applies to the current warp):

  • Start as usual
[Switching focus to CUDA kernel 0, grid 1, block (0,0,0), thread (0,0,0), device 0, sm 0, warp 0, lane 0]

Thread 1 "a.out" hit Breakpoint 1, bitreverse<<<(1,1,1),(256,1,1)>>> (data=0x7fffc1200000) at bitreverse.cu:14
14         array[threadIdx.x] = ((0xf0f0f0f0 & array[threadIdx.x]) >> 4) |
(cuda-gdb) info cuda warps
  Wp Active Lanes Mask Divergent Lanes Mask Active Physical PC Kernel BlockIdx First Active ThreadIdx 
Device 0 SM 0
*  0        0xffffffff           0x00000000 0x00000000000002e0      0  (0,0,0)                (0,0,0) 
   1        0xffffffff           0x00000000 0x00000000000002e0      0  (0,0,0)               (32,0,0) 
   2        0xffffffff           0x00000000 0x00000000000002e0      0  (0,0,0)               (64,0,0) 
   3        0xffffffff           0x00000000 0x00000000000002e0      0  (0,0,0)               (96,0,0) 
   4        0xffffffff           0x00000000 0x00000000000002e0      0  (0,0,0)              (128,0,0) 
   5        0xffffffff           0x00000000 0x00000000000002e0      0  (0,0,0)              (160,0,0) 
   6        0xffffffff           0x00000000 0x00000000000002e0      0  (0,0,0)              (192,0,0) 
   7        0xffffffff           0x00000000 0x00000000000002e0      0  (0,0,0)              (224,0,0) 
(cuda-gdb) 
  • When warps hit the first breakpoint, advance the warp 0 to the 3rd breakpoint (note that PC has not changed for other warps):
(cuda-gdb) step

Thread 1 "a.out" hit Breakpoint 2, bitreverse<<<(1,1,1),(256,1,1)>>> (data=0x7fffc1200000) at bitreverse.cu:16
16         array[threadIdx.x] = ((0xcccccccc & array[threadIdx.x]) >> 2) |
(cuda-gdb) step

Thread 1 "a.out" hit Breakpoint 3, bitreverse<<<(1,1,1),(256,1,1)>>> (data=0x7fffc1200000) at bitreverse.cu:18
18         array[threadIdx.x] = ((0xaaaaaaaa & array[threadIdx.x]) >> 1) |
(cuda-gdb) info cuda warps
  Wp Active Lanes Mask Divergent Lanes Mask Active Physical PC Kernel BlockIdx First Active ThreadIdx 
Device 0 SM 0
*  0        0xffffffff           0x00000000 0x00000000000008c0      0  (0,0,0)                (0,0,0) 
   1        0xffffffff           0x00000000 0x00000000000002e0      0  (0,0,0)               (32,0,0) 
   2        0xffffffff           0x00000000 0x00000000000002e0      0  (0,0,0)               (64,0,0) 
   3        0xffffffff           0x00000000 0x00000000000002e0      0  (0,0,0)               (96,0,0) 
   4        0xffffffff           0x00000000 0x00000000000002e0      0  (0,0,0)              (128,0,0) 
   5        0xffffffff           0x00000000 0x00000000000002e0      0  (0,0,0)              (160,0,0) 
   6        0xffffffff           0x00000000 0x00000000000002e0      0  (0,0,0)              (192,0,0) 
   7        0xffffffff           0x00000000 0x00000000000002e0      0  (0,0,0)              (224,0,0)
  • Resume execution - other warps would catch up (to breakpoint 2):
(cuda-gdb) c
Continuing.
[Switching focus to CUDA kernel 0, grid 1, block (0,0,0), thread (32,0,0), device 0, sm 0, warp 1, lane 0]

Thread 1 "a.out" hit Breakpoint 2, bitreverse<<<(1,1,1),(256,1,1)>>> (data=0x7fffc1200000) at bitreverse.cu:16
16         array[threadIdx.x] = ((0xcccccccc & array[threadIdx.x]) >> 2) |
(cuda-gdb) info cuda warps
  Wp Active Lanes Mask Divergent Lanes Mask Active Physical PC Kernel BlockIdx First Active ThreadIdx 
Device 0 SM 0
   0        0xffffffff           0x00000000 0x0000000000000cc0      0  (0,0,0)                (0,0,0) 
*  1        0xffffffff           0x00000000 0x00000000000005d0      0  (0,0,0)               (32,0,0) 
   2        0xffffffff           0x00000000 0x00000000000005d0      0  (0,0,0)               (64,0,0) 
   3        0xffffffff           0x00000000 0x00000000000005d0      0  (0,0,0)               (96,0,0) 
   4        0xffffffff           0x00000000 0x00000000000005d0      0  (0,0,0)              (128,0,0) 
   5        0xffffffff           0x00000000 0x00000000000005d0      0  (0,0,0)              (160,0,0) 
   6        0xffffffff           0x00000000 0x00000000000005d0      0  (0,0,0)              (192,0,0) 
   7        0xffffffff           0x00000000 0x00000000000005d0      0  (0,0,0)              (224,0,0) 
  • One more time (to breakpoint 3). Note that warp 0 is done by now:
(cuda-gdb) c
Continuing.

Thread 1 "a.out" hit Breakpoint 3, bitreverse<<<(1,1,1),(256,1,1)>>> (data=0x7fffc1200000) at bitreverse.cu:18
18         array[threadIdx.x] = ((0xaaaaaaaa & array[threadIdx.x]) >> 1) |
(cuda-gdb) info cuda warps
  Wp Active Lanes Mask Divergent Lanes Mask Active Physical PC Kernel BlockIdx First Active ThreadIdx 
Device 0 SM 0
*  1        0xffffffff           0x00000000 0x00000000000008c0      0  (0,0,0)               (32,0,0) 
   2        0xffffffff           0x00000000 0x00000000000008c0      0  (0,0,0)               (64,0,0) 
   3        0xffffffff           0x00000000 0x00000000000008c0      0  (0,0,0)               (96,0,0) 
   4        0xffffffff           0x00000000 0x00000000000008c0      0  (0,0,0)              (128,0,0) 
   5        0xffffffff           0x00000000 0x00000000000008c0      0  (0,0,0)              (160,0,0) 
   6        0xffffffff           0x00000000 0x00000000000008c0      0  (0,0,0)              (192,0,0) 
   7        0xffffffff           0x00000000 0x00000000000008c0      0  (0,0,0)              (224,0,0) 

I totally get it. Thank you very much for your detailed reply; I will read the cuda-gdb manual carefully.

Glad I was able to help. I have marked this topic as resolved.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.