cudaMemcpy and DtoD for uva

Hi,

I’m trying to capture the calls to cudaMemcpy* for an application running under a cuda-aware MPI (mvapich2).
This MPI uses uva for DtoD communication.
I have a simple ping-pong test and with nvvp I can see some cudaMemcpyAsync DtoD as I expect (but without showing the id of the source and destination devices).
However, when I capture calls to cudaMemcpy* either with CUPTI or by overriding the cudaMemcpy function symbols (with dlsym) I don’t see any DtoD.
I capture only some DtoH, AtoH and HtoA memory kinds.
Is uva not using cudaMemcpy to transfer the data between the GPUs?

Moreover, in a DtoD transfer I would like to be able to identify the source and destination devices.
I have tried to use cudaPointerAttributes but I only obtain -1 in the device field.
How do I identify the device id of the targeted device in a uva mode?

Thanks!
Mmx