Consider two functions below.
void lock_n() {
int raw;
relock:
raw = 0;
if (!__atomic_compare_exchange_n(&ref_count, &raw, -1, true, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) {
thrd_yield();
goto relock;
}
}
void lock() {
int raw;
int state = -1;
relock:
raw = 0;
if (!__atomic_compare_exchange(&ref_count, &raw, &state, true, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) {
thrd_yield();
goto relock;
}
}
And here is the __atomic_compare_exchange_n
implementation in cuda-11.5
template<class _Type>
bool __atomic_compare_exchange_n(_Type volatile *__ptr, _Type *__expected, _Type __desired, bool __weak, int __success_memorder, int __failure_memorder) {
return __atomic_compare_exchange(__ptr, __expected, &__desired, __weak, __success_memorder, __failure_memorder);
}
So lock_n
just calls lock
under the hood. This works as expected with gcc and clang.
It also works as expected when building in debug mode using nvc++, but it gives a warning for the lock_n
for the -1
argument.
warning: integer conversion resulted in a change of sign
In release mode with nvc++ lock_n
goes into an infinite loop, while lock
works as expected.
This looks like a compiler bug.
nvc++ flags: -fast -Mvect forces -O3
Note: Bug persists both in single and multi threaded runs, so problem is not a race.