I use PTX assembly’s barrier.sync
function. If I synchronize all the threads in a block with barrier.sync
after a shared memory write, then would I also need to use __threadfence()
, or is it implicitly done by barrier.sync
?
I use PTX assembly’s barrier.sync
function. If I synchronize all the threads in a block with barrier.sync
after a shared memory write, then would I also need to use __threadfence()
, or is it implicitly done by barrier.sync
?