I notice that in CUDA 11.4, the <cooperative_groups.h> header transitively includes this header: cooperative_groups/details/partitioning.h, which contains the following line of code:
unsigned int subMask = __match_any_sync(thisMask, pred);
Now, __match_any_sync is a primitive that only exists on SM 7.0+. Does this mean that we can no longer use cooperative groups on SM < 7.0?
Digging deeper, I notice that the following compiles fine for SM 6.1, but should it? Is it a bug or a feature?
Function templates which are not instantiated do not generate any (invalid) code. That’s why the compilation seems to works fine . The following does not compile on Pascal
Another point regarding the macro you mention - it does protect one use of __any_match_sync in cooperative_groups.h, but it doesn’t protect the inclusion of an incompatible header:
The include of partitioning.h is not guarded by the above macro, and then going inside partitioning.h, I can also see that the __match_any_sync function is also not guarded by a macro. The only thing that is guarded by the macro is the “wrapper”:
EDIT: I just noticed this very issue has been solved in CUDA 11.5, where a guard has been added around the details::_labeled_partition function.
So to summarize:
The answer is YES, cooperative groups are still supported in CC 6.x.
NVCC ignores compiler errors in non-instantiated template functions.
The issue with cooperative_groups.h has been solved in CUDA 11.5. This allows people to be able to use Clang (which does not tolerate compiler errors) to compile CUDA 11.5-code.