kernel only works when one block of threads launched

Hey guys,

Recently I’m implementing a warp bitonic sort with shuffle instructions. When I test the code with only one block of threads, everything is fine and the result is correct. However, the kernel seems does not run at all when I launch two blocks because it does not sort any thing. There are no interactions between blocks and there is not error code popup.
Anyone had experienced such problems before?

"However, the kernel seems does not run at all "

you can of course use the debugger to validate this - does the debugger reach a breakpoint within the kernel?

what comes to mind is that it may be related to your method of synchronization, regardless of whether there is no interaction between blocks - the compiler looks at the former, not the latter

you could also post some code