__atomic_compare_exchange_n bug in release mode, when building with nvc++ 22.3

Consider two functions below.

void lock_n() {
        int raw;
    relock:
        raw = 0;
        if (!__atomic_compare_exchange_n(&ref_count, &raw, -1, true, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) {
            thrd_yield();
            goto relock;
        }
}

void lock() {
        int raw;
        int state = -1;
    relock:
        raw = 0;
        if (!__atomic_compare_exchange(&ref_count, &raw, &state, true, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) {
            thrd_yield();
            goto relock;
        }
}

And here is the __atomic_compare_exchange_n implementation in cuda-11.5

template<class _Type>
bool __atomic_compare_exchange_n(_Type volatile *__ptr, _Type *__expected, _Type __desired, bool __weak, int __success_memorder, int __failure_memorder) {
    return __atomic_compare_exchange(__ptr, __expected, &__desired, __weak, __success_memorder, __failure_memorder);
}

So lock_n just calls lock under the hood. This works as expected with gcc and clang.
It also works as expected when building in debug mode using nvc++, but it gives a warning for the lock_n for the -1 argument.

warning: integer conversion resulted in a change of sign

In release mode with nvc++ lock_n goes into an infinite loop, while lock works as expected.
This looks like a compiler bug.

nvc++ flags: -fast -Mvect forces -O3
Note: Bug persists both in single and multi threaded runs, so problem is not a race.

Also it seems that __atomic_sub_fetch does not work either, in release mode.

__atomic_sub_fetch(&remaining, 1, __ATOMIC_RELEASE);

remaining is 1 both before and after the call.
In debug mode it is 0 after the call.

Thanks ishkahan for the report. Though do you have a complete reproducible example you can share?

-Mat

Just put the snippets somewhere and run them. I’ve found this bug in a large closed source project.
This is a cmake project with c++20.
I do not have a complete reproducible example.