Is there any way to measure the amount of transferred pinned memory through nvprof?
In my opinion, pcie_total_data_received/transferred should be the option, but still confusing.
Thank you in advance.
Hi,
You can use nvprof option “–print-gup-trace” to measure the size and throughput for each CUDA memcpy call. For example run
> /usr/local/cuda/bin/nvprof --print-gpu-trace /usr/local/cuda/samples/bin/x86_64/linux/release/bandwidthTest
==4567== Profiling application: /usr/local/cuda-10.2/samples/bin/x86_64/linux/release/bandwidthTest
==4567== Profiling result:
Start Duration Grid Size Block Size Regs* SSMem* DSMem* Size Throughput SrcMemType DstMemType Device Context Stream Name
510.15ms 2.5700ms - - - - - 30.518MB 11.596GB/s Pinned Device TITAN V (0) 1 7 [CUDA memcpy HtoD]
512.72ms 2.5670ms - - - - - 30.518MB 11.610GB/s Pinned Device TITAN V (0) 1 7 [CUDA memcpy HtoD]
Alternatively you can use nvvp to view memcpy information. Select a specific memcpy instance from the timeline and check the properties in properties view as shown in the below screenshot.
–
Thanks
Hi @rameshgunjal.
Thank you for your help.
But I am not using memCpy().
According to the recent documentation, cudaHostAlloc() with cudaHostAllocDefault makes mapped-memory between CPU and GPU; previously, we need another flag for memory mapping, but now both flags support.
Therefore, I could directly access pinned data from GPU without any copy operation.
What I want to check is to measure the amount of data during the below simple loop.
int main() {
…
cudaHostAlloc((void **)(&array), arrSize, cudaHostAllocDefault)
…
cudaProfilerStart();
test<<<<1,1>>>>(array, arrSize);
cudaProfilerStop();
…
}
global void test(int *arr, int arrSize) {
for (int i = 0; i < arrSize; i++) { arr[i] = i; }
}
In this case, “–print-gpu-trace” option did not print any result.
That’s why I am asking the pcie metrics.
According to the documentation, pcie-related flags are explained with one sentence, but
could you explain more about them?
Also, I am wondering this approach using pcie-related flags is not enough to measure the amount of pinned memory usage?
Thank you.