CUDA math includes some load/store intrinsic with cache hint, for example, __ldca, __ldcs, __ldcg, __ldcv, __stcs, __stcs, __stwb, __stwt. When using these intrinsic to load/store, are these load/store atomic? I didn’t find any description about it in CUDA math doc.
Thanks very much.