Syncthreads hangs although called exactly 3 times by all threads

It looks like a compiler code generation issue (defect) to me. On CUDA 12.0, I note that if I compile with -G the code does not hang, and I note that if I replace size = cnt_such(); with size = 1995840; it also does not hang.

My suggestions:

  1. retest on the latest available CUDA version if you are not already on that version.
  2. if the issue is still reproducible there, file a bug.
1 Like