Does CUDA C have a this_thread_group() function?


I am seeing an article that describes CUDA Fortran and it says that you can get access to the current thread group like this

use cooperative_groups
type(thread_group) :: tg
tg = this_thread_group()

Is there a similar support in the CUDA C/C++ language? I haven’t found anything in the CUDA C Programming guide.

This feature could be useful when implementing operator overloads, e.g. an addition for vector or matrix objects where the operation could attempt to auto-parallelize over whatever thread group the current threads are in. It is not possible to pass an explicit thread_group object into operator overloads like you could do with object member functions. A this_thread_group() API could save the day.

After some consideration I think the Fortran article has a typo and that it should say this_thread_block() in the code sample.

Supporting a this_thread_group() API call is probably not possible, as you can subdivide the block into multiple groups (or tiled partitions) of different sizes. So a thread can be a member of several groups - hence you must explicitly specify which group you want to operate on

That’s unfortunate for C++ operator overloads as these can’t accept additional arguments like a thread group.
A parallelized vector addition would therefore have to use the bulky syntax

vectorC = vectorA.add(vectorB, thread_group);
instead of
vectorC = vectorA + vectorB;

I could imagine adding a stack-like API to the Cooperative Threads feature that allows the programmer to maintain a history (or stack) of thread groups for the thread. This API could support a push, peek and pop where the peek retrives the current thread group for the thread.

You could create a group on-the-fly, in the operator, perhaps.

An operator overload that was part of a user-defined class could also leverage group info contained in the class.

It’s not obvious to me that a vector add would be benefitted by a formal CG, but perhaps dot-product might.