CUDA 11.4 - cooperative groups no longer supported on SM < 7.0?


I notice that in CUDA 11.4, the <cooperative_groups.h> header transitively includes this header: cooperative_groups/details/partitioning.h, which contains the following line of code:

unsigned int subMask = __match_any_sync(thisMask, pred);

Now, __match_any_sync is a primitive that only exists on SM 7.0+. Does this mean that we can no longer use cooperative groups on SM < 7.0?

Digging deeper, I notice that the following compiles fine for SM 6.1, but should it? Is it a bug or a feature?


template <typename T>
__device__ void foo()

If I remove the template, then the code doesn’t compile for SM 6.1, saying that __match_any_sync is undefined. What magic is NVCC doing here?


cooperative groups work perfectly fine with sm < 7.0. You have missed the macro which enables the partition functions only on sm 7.0 and newer.

#if (__CUDA_ARCH__ >= 700) || !defined(__CUDA_ARCH__)

Function templates which are not instantiated do not generate any (invalid) code. That’s why the compilation seems to works fine . The following does not compile on Pascal

template <typename T>
__device__ void foo()

template __device__ void foo<int>();

Thanks, I did miss that flag. What happens when !defined(__CUDA_ARCH__)? Is a stub defined somewhere when compiling for host?

In regular C++ code, the compiler will complain for template functions that are invalid even if they are not instantiated:

That’s why it’s surprising to me that NVCC behaves slightly differently.

For example here, NVCC is not diagnosing an obvious compiler error which would easily be caught by regular C++ compilers. Is that expected or a bug?

If you switch to compile with Clang instead of NVCC, then the error is caught correctly.

Another point regarding the macro you mention - it does protect one use of __any_match_sync in cooperative_groups.h, but it doesn’t protect the inclusion of an incompatible header:

#include <cooperative_groups/details/partitioning.h>

# endif /* ! (__cplusplus, __CUDACC__) */

#endif /* !_COOPERATIVE_GROUPS_H_ */

The include of partitioning.h is not guarded by the above macro, and then going inside partitioning.h, I can also see that the __match_any_sync function is also not guarded by a macro. The only thing that is guarded by the macro is the “wrapper”:

template <typename TyPredicate>
_CG_STATIC_QUALIFIER coalesced_group labeled_partition(const coalesced_group &tile, TyPredicate pred) {
    return details::_labeled_partition(tile, pred);

But the details::_labeled_partition function is not guarded by the macro. It just happens to be a template and for some reason NVCC ignores the error.

Would it make sense to guard also the detail namespace with the macro? Alternatively, to guard the #include "details/partitioning.h" with the macro?

EDIT: I just noticed this very issue has been solved in CUDA 11.5, where a guard has been added around the details::_labeled_partition function.

So to summarize:

  • The answer is YES, cooperative groups are still supported in CC 6.x.
  • NVCC ignores compiler errors in non-instantiated template functions.
  • The issue with cooperative_groups.h has been solved in CUDA 11.5. This allows people to be able to use Clang (which does not tolerate compiler errors) to compile CUDA 11.5-code.

Thanks for the discussion!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.