No matter how many warps there are, only one warp hits the breakpoint, right?

204540478 · March 28, 2023, 7:00am

I followed the documentation, walk-through examples bitreverse;
After hitting the breakpoint, I switch to warp3;
Then type the continue command, and when it stops again, it automatically switches to warp0;
1679986744823

Is thisinconsistent with the description of the document？

AKravets · March 28, 2023, 11:50am

Hi @204540478,
Could you please share the full log on the debugging session?

AKravets · March 28, 2023, 12:00pm

It looks like the following is happening:

One of the application warps hits the breakpoint 3 (on GPU)
- Your sceenshot doesn’t show which thread
You switch the focus to the warp 3 - this doesn’t affect the GPU-side execution, so multiple remaining warps can hit the same breakpoint
You type continue so the app resumes execution (with some warps already past the breakpoint and some warps before breakpoint)
Other warp hits the same breakpoint

Having the full log might provide more info.

204540478 · March 29, 2023, 4:00am

Sorry, confused;

Why do I only see the same thread hit the breakpoint every time; other warp/threads that hit the breakpoint are not displayed?
Even if I switch to warp 3; it doesn’t mean that the warp will hit the breakpoint, right？ Switching focus is useless, only displayed the warp which hits the breakpoint first and ignore the warp/thread that hits the breakpoint later;
When warp0 hits the breakpoint, some warps are before the breakpoint and some are after the breakpoint，
Is my understanding correct?

item 2 or 3 ,which is is correct？

1680070357246

AKravets · March 29, 2023, 7:32am

Hi!
Can you please collect the logs are follows:

Start new cuda-gdb command
Execute the following commands first

set trace-commands on
set logging enabled on

Run the debugging session
When you quit the debugger, you should have gdb.txt file generated - can you share this file?

204540478 · March 29, 2023, 7:56am

example:

gdb.txt:

gdb.txt (16.1 KB)

Thanks for your reply;

My question is will each warp stop when it hits a breakpoint? Like the thread of the cpu; or switch to a certain warp, only this warp can hit the breakpoint;
When a warp hits a breakpoint and stops, do all warps hit the breakpoint?

AKravets · March 29, 2023, 2:20pm

Thank you for the logs, now I can provide must better explanation. First of all, please note that execution is happening on the warp level (so all threads of the warp execute the same instruction/hit the breakpoint).

When running the bitreverse sample warps don’t diverge (so all warps are executing the same instructions), **which is not always the case **.

When warp 0 hits the first breakpoint, all warps hit it at the same time (note the PC addresses of the active warps). In such case debugger only reports single BP hit (for an arbitrary warp).

Thread 1 "bitreverse" hit Breakpoint 1, bitreverse<<<(1,1,1),(256,1,1)>>> (data=0x7fff3da00000) at bitreverse.cu:14
14              array[threadIdx.x] = ((0xf0f0f0f0 & array[threadIdx.x]) >> 4) |
+info cuda warps
  Wp Active Lanes Mask Divergent Lanes Mask Active Physical PC Kernel BlockIdx First Active ThreadIdx
Device 0 SM 0
*  0        0xffffffff           0x00000000 0x00000000000002e0      0  (0,0,0)                (0,0,0)
   1        0xffffffff           0x00000000 0x00000000000002e0      0  (0,0,0)               (32,0,0)
   2        0xffffffff           0x00000000 0x00000000000002e0      0  (0,0,0)               (64,0,0)
   3        0xffffffff           0x00000000 0x00000000000002e0      0  (0,0,0)               (96,0,0)
   4        0xffffffff           0x00000000 0x00000000000002e0      0  (0,0,0)              (128,0,0)
   5        0xffffffff           0x00000000 0x00000000000002e0      0  (0,0,0)              (160,0,0)
   6        0xffffffff           0x00000000 0x00000000000002e0      0  (0,0,0)              (192,0,0)
   7        0xffffffff           0x00000000 0x00000000000002e0      0  (0,0,0)              (224,0,0)

Same for breakpoints 2 and 3

Thread 1 "bitreverse" hit Breakpoint 2, bitreverse<<<(1,1,1),(256,1,1)>>> (data=0x7fff3da00000) at bitreverse.cu:16
16              array[threadIdx.x] = ((0xcccccccc & array[threadIdx.x]) >> 2) |
+info break
Num     Type           Disp Enb Address            What
1       breakpoint     keep y   0x0000555555d44ce0 in bitreverse(void*) at bitreverse.cu:14
        breakpoint already hit 1 time
2       breakpoint     keep y   0x0000555555d44fd0 in bitreverse(void*) at bitreverse.cu:16
        breakpoint already hit 1 time
3       breakpoint     keep y   0x0000555555d452c0 in bitreverse(void*) at bitreverse.cu:18
+info cuda warps
  Wp Active Lanes Mask Divergent Lanes Mask Active Physical PC Kernel BlockIdx First Active ThreadIdx
Device 0 SM 0
*  0        0xffffffff           0x00000000 0x00000000000005d0      0  (0,0,0)                (0,0,0)
   1        0xffffffff           0x00000000 0x00000000000005d0      0  (0,0,0)               (32,0,0)
   2        0xffffffff           0x00000000 0x00000000000005d0      0  (0,0,0)               (64,0,0)
   3        0xffffffff           0x00000000 0x00000000000005d0      0  (0,0,0)               (96,0,0)
   4        0xffffffff           0x00000000 0x00000000000005d0      0  (0,0,0)              (128,0,0)
   5        0xffffffff           0x00000000 0x00000000000005d0      0  (0,0,0)              (160,0,0)
   6        0xffffffff           0x00000000 0x00000000000005d0      0  (0,0,0)              (192,0,0)
   7        0xffffffff           0x00000000 0x00000000000005d0      0  (0,0,0)              (224,0,0)

Thread 1 "bitreverse" hit Breakpoint 3, bitreverse<<<(1,1,1),(256,1,1)>>> (data=0x7fff3da00000) at bitreverse.cu:18
18              array[threadIdx.x] = ((0xaaaaaaaa & array[threadIdx.x]) >> 1) |
+info cuda warps
  Wp Active Lanes Mask Divergent Lanes Mask Active Physical PC Kernel BlockIdx First Active ThreadIdx
Device 0 SM 0
*  0        0xffffffff           0x00000000 0x00000000000008c0      0  (0,0,0)                (0,0,0)
   1        0xffffffff           0x00000000 0x00000000000008c0      0  (0,0,0)               (32,0,0)
   2        0xffffffff           0x00000000 0x00000000000008c0      0  (0,0,0)               (64,0,0)
   3        0xffffffff           0x00000000 0x00000000000008c0      0  (0,0,0)               (96,0,0)
   4        0xffffffff           0x00000000 0x00000000000008c0      0  (0,0,0)              (128,0,0)
   5        0xffffffff           0x00000000 0x00000000000008c0      0  (0,0,0)              (160,0,0)
   6        0xffffffff           0x00000000 0x00000000000008c0      0  (0,0,0)              (192,0,0)
   7        0xffffffff           0x00000000 0x00000000000008c0      0  (0,0,0)              (224,0,0)

My question is will each warp stop when it hits a breakpoint? Like the thread of the cpu; or switch to a certain warp, only this warp can hit the breakpoint;

When BP is hit by a warp (or multiple warps if multiple warps arrive to it at the same time) all warps on GPU are stopped (docs link: CUDA-GDB)

When a breakpoint is set, it forces all resident GPU threads to stop at this location when it reaches the corresponding PC.

Focus doesn’t affect GPU side execution/breakpoints. It’s only used to specify current warp for debugger operations.

When a warp hits a breakpoint and stops, do all warps hit the breakpoint?

All warps, which are executing the same instructions as the warp, which hit the breakpoint. E.g. if due to some reason there is a divergence and warps 0-3 diverge from warps 4-7 the following might happen (one possible scenario):

Warps 0-3 hit the breakpoint 1 - debugger reports breakpoint for (e.g.) warp 0
Warps 4-7 hit the breakpoint 1 - debugger reports breakpoint for (e.g.) warp 4
Warps 0-3 hit the breakpoint 2 - debugger reports breakpoint for (e.g.) warp 0
Warps 4-7 hit the breakpoint 2 - debugger reports breakpoint for (e.g.) warp 4
Warps 0-3 hit the breakpoint 3 - debugger reports breakpoint for (e.g.) warp 0
Warps 4-7 hit the breakpoint 3 - debugger reports breakpoint for (e.g.) warp 4

204540478 · March 30, 2023, 2:43am

@AKravets Thank you so much for your long reply;

That is to say, in any scenario, if there is no divergence , if the GDB reports that a certain warp hits a breakpoint,does it mean that all warps/threads have hit this BP;
Unlike CPU threads, each thread (GPU warps) hits the same breakpoint once;

I type the “where” command and see the result is that all warps run to the same line of code;

AKravets · March 30, 2023, 7:33am

Yes, if there is no divergence (so all warps are executing the same instruction), all warps would hit the BP at the same time, but the debugger would only report it once.

It can be confirmed by checking the warp status (e.g. cuda info warps) to check that all warps have the same PC (program counter).

If you want to observe different behavior you can also introduce divergence manually by advancing a warp via step command (note that when you issue step command it only applies to the current warp):

Start as usual

[Switching focus to CUDA kernel 0, grid 1, block (0,0,0), thread (0,0,0), device 0, sm 0, warp 0, lane 0]

Thread 1 "a.out" hit Breakpoint 1, bitreverse<<<(1,1,1),(256,1,1)>>> (data=0x7fffc1200000) at bitreverse.cu:14
14         array[threadIdx.x] = ((0xf0f0f0f0 & array[threadIdx.x]) >> 4) |
(cuda-gdb) info cuda warps
  Wp Active Lanes Mask Divergent Lanes Mask Active Physical PC Kernel BlockIdx First Active ThreadIdx 
Device 0 SM 0
*  0        0xffffffff           0x00000000 0x00000000000002e0      0  (0,0,0)                (0,0,0) 
   1        0xffffffff           0x00000000 0x00000000000002e0      0  (0,0,0)               (32,0,0) 
   2        0xffffffff           0x00000000 0x00000000000002e0      0  (0,0,0)               (64,0,0) 
   3        0xffffffff           0x00000000 0x00000000000002e0      0  (0,0,0)               (96,0,0) 
   4        0xffffffff           0x00000000 0x00000000000002e0      0  (0,0,0)              (128,0,0) 
   5        0xffffffff           0x00000000 0x00000000000002e0      0  (0,0,0)              (160,0,0) 
   6        0xffffffff           0x00000000 0x00000000000002e0      0  (0,0,0)              (192,0,0) 
   7        0xffffffff           0x00000000 0x00000000000002e0      0  (0,0,0)              (224,0,0) 
(cuda-gdb)

When warps hit the first breakpoint, advance the warp 0 to the 3rd breakpoint (note that PC has not changed for other warps):

(cuda-gdb) step

Thread 1 "a.out" hit Breakpoint 2, bitreverse<<<(1,1,1),(256,1,1)>>> (data=0x7fffc1200000) at bitreverse.cu:16
16         array[threadIdx.x] = ((0xcccccccc & array[threadIdx.x]) >> 2) |
(cuda-gdb) step

Thread 1 "a.out" hit Breakpoint 3, bitreverse<<<(1,1,1),(256,1,1)>>> (data=0x7fffc1200000) at bitreverse.cu:18
18         array[threadIdx.x] = ((0xaaaaaaaa & array[threadIdx.x]) >> 1) |
(cuda-gdb) info cuda warps
  Wp Active Lanes Mask Divergent Lanes Mask Active Physical PC Kernel BlockIdx First Active ThreadIdx 
Device 0 SM 0
*  0        0xffffffff           0x00000000 0x00000000000008c0      0  (0,0,0)                (0,0,0) 
   1        0xffffffff           0x00000000 0x00000000000002e0      0  (0,0,0)               (32,0,0) 
   2        0xffffffff           0x00000000 0x00000000000002e0      0  (0,0,0)               (64,0,0) 
   3        0xffffffff           0x00000000 0x00000000000002e0      0  (0,0,0)               (96,0,0) 
   4        0xffffffff           0x00000000 0x00000000000002e0      0  (0,0,0)              (128,0,0) 
   5        0xffffffff           0x00000000 0x00000000000002e0      0  (0,0,0)              (160,0,0) 
   6        0xffffffff           0x00000000 0x00000000000002e0      0  (0,0,0)              (192,0,0) 
   7        0xffffffff           0x00000000 0x00000000000002e0      0  (0,0,0)              (224,0,0)

Resume execution - other warps would catch up (to breakpoint 2):

(cuda-gdb) c
Continuing.
[Switching focus to CUDA kernel 0, grid 1, block (0,0,0), thread (32,0,0), device 0, sm 0, warp 1, lane 0]

Thread 1 "a.out" hit Breakpoint 2, bitreverse<<<(1,1,1),(256,1,1)>>> (data=0x7fffc1200000) at bitreverse.cu:16
16         array[threadIdx.x] = ((0xcccccccc & array[threadIdx.x]) >> 2) |
(cuda-gdb) info cuda warps
  Wp Active Lanes Mask Divergent Lanes Mask Active Physical PC Kernel BlockIdx First Active ThreadIdx 
Device 0 SM 0
   0        0xffffffff           0x00000000 0x0000000000000cc0      0  (0,0,0)                (0,0,0) 
*  1        0xffffffff           0x00000000 0x00000000000005d0      0  (0,0,0)               (32,0,0) 
   2        0xffffffff           0x00000000 0x00000000000005d0      0  (0,0,0)               (64,0,0) 
   3        0xffffffff           0x00000000 0x00000000000005d0      0  (0,0,0)               (96,0,0) 
   4        0xffffffff           0x00000000 0x00000000000005d0      0  (0,0,0)              (128,0,0) 
   5        0xffffffff           0x00000000 0x00000000000005d0      0  (0,0,0)              (160,0,0) 
   6        0xffffffff           0x00000000 0x00000000000005d0      0  (0,0,0)              (192,0,0) 
   7        0xffffffff           0x00000000 0x00000000000005d0      0  (0,0,0)              (224,0,0)

One more time (to breakpoint 3). Note that warp 0 is done by now:

(cuda-gdb) c
Continuing.

Thread 1 "a.out" hit Breakpoint 3, bitreverse<<<(1,1,1),(256,1,1)>>> (data=0x7fffc1200000) at bitreverse.cu:18
18         array[threadIdx.x] = ((0xaaaaaaaa & array[threadIdx.x]) >> 1) |
(cuda-gdb) info cuda warps
  Wp Active Lanes Mask Divergent Lanes Mask Active Physical PC Kernel BlockIdx First Active ThreadIdx 
Device 0 SM 0
*  1        0xffffffff           0x00000000 0x00000000000008c0      0  (0,0,0)               (32,0,0) 
   2        0xffffffff           0x00000000 0x00000000000008c0      0  (0,0,0)               (64,0,0) 
   3        0xffffffff           0x00000000 0x00000000000008c0      0  (0,0,0)               (96,0,0) 
   4        0xffffffff           0x00000000 0x00000000000008c0      0  (0,0,0)              (128,0,0) 
   5        0xffffffff           0x00000000 0x00000000000008c0      0  (0,0,0)              (160,0,0) 
   6        0xffffffff           0x00000000 0x00000000000008c0      0  (0,0,0)              (192,0,0) 
   7        0xffffffff           0x00000000 0x00000000000008c0      0  (0,0,0)              (224,0,0)

204540478 · March 30, 2023, 8:10am

I totally get it. Thank you very much for your detailed reply; I will read the cuda-gdb manual carefully.

AKravets · March 30, 2023, 8:16am

Glad I was able to help. I have marked this topic as resolved.

system · April 13, 2023, 8:17am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Cuda-GDB doesn't hit breakpoints inside kernel/ if the kernel is in a library and then linked to the executable CUDA-GDB vscode , cuda-gdb	9	2842	April 13, 2023
Cuda-gdb doesn't break and/or step into Kernels CUDA Programming and Performance	26	53750	August 1, 2011
warp and core What's the relationship between warp and core? CUDA Programming and Performance	12	15585	February 4, 2011
Controlling context switching in CUDA CUDA Programming and Performance	16	4448	April 29, 2013
Path Divergence CUDA Programming and Performance	20	4913	August 20, 2010
Unexpected behavior of cuda-gdb with Tesla M2050 CUDA Programming and Performance	1	829	October 26, 2010
Program received signal CUDA_EXCEPTION_10, Device Illegal Address. CUDA-GDB	5	3243	March 3, 2017
CUDA warp branching problem Stange bug due to unexpected warp branching order CUDA Programming and Performance	17	12106	July 11, 2010
Another question regarding the bizarre behavior of grid nesting and synchronization in cuda samples CUDA Programming and Performance	0	568	April 8, 2018
global to shared mem loads and sync CUDA Programming and Performance	26	11441	February 21, 2008

No matter how many warps there are, only one warp hits the breakpoint, right?

Related topics