Hi all,
I know that this question was asked previously, but I wondered if this new functionality was added in latest CUDA ?
Hi all,
I know that this question was asked previously, but I wondered if this new functionality was added in latest CUDA ?
Is __threadfence_system() what you’re looking for?
That said, for ordering within a device, __threadfence() should be enough.