CUDA 11.4 - cooperative groups no longer supported on SM < 7.0?

cooperative groups work perfectly fine with sm < 7.0. You have missed the macro which enables the partition functions only on sm 7.0 and newer.

#if (__CUDA_ARCH__ >= 700) || !defined(__CUDA_ARCH__)
# define _CG_HAS_MATCH_COLLECTIVE
#endif

Function templates which are not instantiated do not generate any (invalid) code. That’s why the compilation seems to works fine . The following does not compile on Pascal

template <typename T>
__device__ void foo()
{
    __match_any_sync(0,0);
}

template __device__ void foo<int>();