Recently I’m implementing a warp bitonic sort with shuffle instructions. When I test the code with only one block of threads, everything is fine and the result is correct. However, the kernel seems does not run at all when I launch two blocks because it does not sort any thing. There are no interactions between blocks and there is not error code popup.
Anyone had experienced such problems before?