CUDA analog to OpenCL barrier(CLK_GLOBAL_MEM_FENCE)

Hi all,

I know that this question was asked previously, but I wondered if this new functionality was added in latest CUDA ?

Is __threadfence_system() what you’re looking for?

That said, for ordering within a device, __threadfence() should be enough.