Hello I’m working on implementing a parallel linked data structure and I know that atomic operations and other operations such as CAS can be used to perform update the links concurrently.
My question is, what if I have multiple kernels executing concurrently, will atomic operations still be effective? My intuition says they shouldn’t be limited to work withing a single kernel but I need to confirm it.
Also I’ll pose a second question in case any of you can point me to some direction.
While implementing the same structure on the host, I noticed a performance improvement when I switched to a slab allocation approach for the nodes.
Has any similar work been done for a GPU application? Basically I’m looking for an example allocating a block of memory and each thread being assigned a pointer to a portion of this block. Rather than deallocating memory, I’d rather have the threads recycle the pointers.
Thanks for helping