I am seeing an article that describes CUDA Fortran and it says that you can get access to the current thread group like this
use cooperative_groups type(thread_group) :: tg tg = this_thread_group()
Is there a similar support in the CUDA C/C++ language? I haven’t found anything in the CUDA C Programming guide.
This feature could be useful when implementing operator overloads, e.g. an addition for vector or matrix objects where the operation could attempt to auto-parallelize over whatever thread group the current threads are in. It is not possible to pass an explicit thread_group object into operator overloads like you could do with object member functions. A this_thread_group() API could save the day.