Hi everyone,
I am interested in implementing a critical section in my GPU code. I realize that the performance for this will be terrible, and that I should try to avoid it, and that there have been numerous posts on this in the past, but I haven’t seen any posts that talk about implementing behavior similar to pthread_cond_wait() or pthread_cond_broadcast() (pretty much all of the previous topics dealt with acquiring and releasing a lock in some form or fashion…there may be a reason for this that I don’t know of).
Anyways, I was wondering if anyone could provide me some insight as to how I might go about implementing functionality similar to pthread_cond_wait() or pthread_cond_broadcast()? Is it even possible? Is it implemented implicitly when using atomic memory operations (for the locks)? For example, say I wanted to implement function where all threads but 1 were waiting for the other thread to set a variable for them (yes, I realize this is slow and will cause warp divergence), how might I go about this?
Example (Pseudo-code):
[codebox]if ( threadIdx != 0 )
{
// wait for value to be set (pthread_cond_wait)
}
else
{
// acquire lock
// do whatever I need to do before the other threads can do anything
// set value
// broadcast to other threads the new value (pthread_cond_broadcast)
// release lock
}
__syncthreads() or __threadfence // depending on which is needed[/codebox]
I am using an NVS Quadro 295 with CUDA 2.2 with Compute Capability 1.1, if that helps.
Thanks,
Matt