__pipeline_memcpy_async VS cuda::memcpy_async

Would please explain the differences in how to use A and B?
For example, If we want to flexibly synchlonize with cooperative groups, we have to use cuda::memcpy_async.

__pipeline_memcpy_async
cuda::memcpy_async