Asynchronous Data Transfer With Devices Enabled ForPeer-to-Peer Communication

If 2 devices are enabled for peer-to-peer data transfer and do not need to communicate data via a host then for asynchronous data transfer to occur between the 2 devices does the memory on the device need to be declared and allocated in a particular way?

For example, does allocating with the following permit asynchronous data transfer of array:
array = new CuArray(size);

or must cudaMalloc be used?

CuArray isn’t part of CUDA. It looks like it belongs to some add-on library like Julia or Kaldi.

It would be impossible to say anything about CuArray unless you define where you are getting it from, and what its definition is.

For transfer of device data from one GPU to another, cudaMalloc is sufficient.