In my program, I need to have a counter in GPU that can be accessed by all threads.
The purpose of this counter is to find out how many threads have finished designated task. The logic will work like this pseudo code: global test_funciton()
{
int counter = total_threads_number;
while (counter > 0)
{
If(not processed)
do something for this thread id;
if(criterion is met)
{
set this thread as ‘processed’;
counter = counter - 1;
}
__threadfence();
}
}
What is the best way to define such a counter? My understanding is it must be in the global memory. So I think I need to cudaMalloc an array with just one element, which serves as this counter. This may work but when all threads are trying to access this counter, will the thread execution be serialized? Is there a better to define such as counter?