Question about memory transfer

Is there any way to measure the amount of transferred pinned memory through nvprof?
In my opinion, pcie_total_data_received/transferred should be the option, but still confusing.
Thank you in advance.

Hi,

You can use nvprof option “–print-gup-trace” to measure the size and throughput for each CUDA memcpy call. For example run

> /usr/local/cuda/bin/nvprof --print-gpu-trace /usr/local/cuda/samples/bin/x86_64/linux/release/bandwidthTest 

==4567== Profiling application: /usr/local/cuda-10.2/samples/bin/x86_64/linux/release/bandwidthTest
==4567== Profiling result:
   Start  Duration            Grid Size      Block Size     Regs*    SSMem*    DSMem*      Size  Throughput  SrcMemType  DstMemType           Device   Context    Stream  Name
510.15ms  2.5700ms                    -               -         -         -         -  30.518MB  11.596GB/s      Pinned      Device      TITAN V (0)         1         7  [CUDA memcpy HtoD]
512.72ms  2.5670ms                    -               -         -         -         -  30.518MB  11.610GB/s      Pinned      Device      TITAN V (0)         1         7  [CUDA memcpy HtoD]

Alternatively you can use nvvp to view memcpy information. Select a specific memcpy instance from the timeline and check the properties in properties view as shown in the below screenshot.

External Media


Thanks

Hi @rameshgunjal.
Thank you for your help.

But I am not using memCpy().
According to the recent documentation, cudaHostAlloc() with cudaHostAllocDefault makes mapped-memory between CPU and GPU; previously, we need another flag for memory mapping, but now both flags support.

Therefore, I could directly access pinned data from GPU without any copy operation.
What I want to check is to measure the amount of data during the below simple loop.


int main() {

cudaHostAlloc((void **)(&array), arrSize, cudaHostAllocDefault)

cudaProfilerStart();
test<<<<1,1>>>>(array, arrSize);
cudaProfilerStop();

}

global void test(int *arr, int arrSize) {
for (int i = 0; i < arrSize; i++) { arr[i] = i; }
}

In this case, “–print-gpu-trace” option did not print any result.
That’s why I am asking the pcie metrics.

According to the documentation, pcie-related flags are explained with one sentence, but
could you explain more about them?

Also, I am wondering this approach using pcie-related flags is not enough to measure the amount of pinned memory usage?

Thank you.