Why my cuda code deadlock , but when I print a variable, the deadlock disappear?

  • I got a confusing cuda deadlock problem(I am not sure if it’s a deadlock, my code cannot end running).
  • I try to debug it. But when I print the variable “active”, the deadlock just disappeared.
  • But when I commented out the printf code, deadlock appeared again. here is the code
// active = true; group is cooperative group
void cg_insert_with_lock(bool active, thread_block_tile<cg_size> group) {
     //just a lock variable in global memory, initial value is 'unlock_tag'
    uint32_t *bucket_lock_addr = cuckoo_table.bucket_lock[0] + 1;
    while(group.any(active == true)){
        //here is the problem
        //printf("active %d\n", active);
        bool lock_success = false;
        uint32_t group_mask = group.ballot(active == true && *bucket_lock_addr == unlock_tag);
        if(group_mask == 0)
        uint32_t leader = __ffs(group_mask) - 1;
        //try to lock
        if(group.thread_rank() == leader){
            auto result = atomicCAS(bucket_lock_addr, unlock_tag, lock_tag);
            if(result == unlock_tag){
                lock_success = true;
        //lock fail
        if(group.all(lock_success == false)){

        //do something
        if(group.thread_rank() == leader){
            atomicExch(bucket_lock_addr, unlock_tag);
            active = false;