Atomics and TLS

I’m trying to port some projects over to PGI, but I keep running into problems with atomics and thread-local storage. First off, TLS…

Is there any way to determine whether -c11 was passed to the compiler or, better yet, whether TLS is supported? It would also be acceptable to use a PGI-specific construct, if one exists (preferable, even, if it doesn’t require a special flag). Basically, I’m trying to port something like this to PGI:

#if defined(_Thread_local) || (defined(__STDC_VERSION__) && (__STDC_VERSION__ >= 201102L))
#  define THREAD_LOCAL _Thread_local
#elif defined(__GNUC__) || defined(__INTEL_COMPILER) || defined(__SUNPRO_CC) || defined(__IBMCPP__)
#  define THREAD_LOCAL __thread
#elif defined(_WIN32)
#  define THREAD_LOCAL __declspec(thread)
#  error No TLS implementation found.

_Thread_local seems to work with PGI, but ONLY if -c11 is passed.

STDC_VERSION is defined as 199901L even if -c11 is passed, so my current code emits an error. I can understand that; PGI’s C11 support is still incomplete, so advertising it in STDC_VERSION would be premature. Unfortunately, though, it puts me in a tough spot… AFAICT there is no way to tell in the preprocessor whether the compiler supports _Thread_local.

Normally I would use PGIC, PGIC_MINOR, and PGIC_PATCHLEVEL to check for support (and ignore STDC_VERSION), but since TLS only works in C11 mode (unlike other compilers) that doesn’t do much good. I’ve also tried using C11 macros like STDC_NO_THREADS and STDC_NO_ATOMICS in hopes of detecting whether PGI is in C11 mode, but unlike _Thread_local they’re defined in PGI’s C99 mode, too.

As for atomics, is there some variant of atomics which PGI supports that I’m missing? I already have support for

  • Old GCC-style (_sync*)
  • New GCC-style (_atomic*)
  • clang-style (_c11*)
  • C11-style (stdatomic.h)
  • MS-style (Interlocked*)

I’m happy to add another method, but I can’t seem to figure out how to do atomics in PGI. So far my best guess is to require OpenACC or maybe OpenMP, but that’s some pretty significant overhead and I’d strongly prefer something which doesn’t require a compiler flag; this is for a reusable header which you can currently just drop into any C project and be done with it.

Also, I have a PRNG which requires CAS (or a lock), but I don’t see a way to do an atomic compare and swap with OpenACC. This seems like an odd omission… am I missing something, or should I just fall back on a spinlock for OpenACC?

I sent your comments to engineering, and they responded

  1. You need to #define in your code to tell that __thread or __Thread_local is supported for the PGI C compiler. We have a problem here, and
    I logged TPR 23783 to add this capability.

  2. PGI has a plan to support gcc style atomics for the C and C++ compiler that can be used outside of OpenMP and OpenACC. We do not currently have that support for pgcc. Use gcc when you need it outside those areas.

  3. OpenACC does not provide CAS as a first-class directive. It does provide atomic read/write/capture directives, however, there is no mechanism to do an atomic compare. Implementing a critical section across gangs/workers/vectors is not guaranteed to work since the OpenACC execution model allows a thread (owning the lock) to be suspended until other threads complete.

They can only suggest that you do the CAS sequentially.

FWIW, I’d prefer it if you just enabled TLS across the board (i.e., in C99 and even C89 mode) as an extension, which is what other compilers (gcc, clang, icc) do.

Since you already have _Thread_local, it seems like supporting __thread wouldn’t be much effort, and probably worthwhile to make code easier to port. At least GCC, clang, ICC, SunCC, and IBM XL C support it… there is also the MSVC-specific __declspec(thread).

Anyways, thanks for the info :)

TPR 23783 - Set __thread and __Thread_local when -c11/-c1x NOT set.

is fixed in the current 18.1 release.

The issue got corrected when we changed the value for the predefined Macro STDC_VERSION from the wrong the value 199901L to the correct value 201112L when using the switch ‘-c11’.