Track memory allocation per kernel

  1. Hi, using this answer on the forum about reading the bytes of memory allocations using nvprof (my GPU does not support ncu) I can successfully read the bytes moved to the GPU using CudaMemCpy by reading the values in the table ‘CUPTI_ACTIVITY_KIND_MEMCPY’ and performing the analysis with the “–track-memory-allocations on” option.
    I therefore wonder if it is possible to read these metrics but only related to a specific kernel, like with the “–kernels name” analysis for other type of events.

  2. Could you explain to me what the difference is between reading the bytes allocated by the CudaMemCpy obtained by the CUPTI_ACTIVITY_KIND_MEMCPY table and reading the metrics (dram_write_transaction + dram_read_transaction)*32?

Thanks.

a cudaMemcpy operation doesn’t have an inherent association to a kernel

bytes are not allocated by cudaMemcpy. The difference is that one tracks bytes moved by memcpy operations, and the other tracks bytes read or written by the GPU. For example, when a CUDA kernel is reading and writing data by executing CUDA C++ device code, if that causes movement of data across the DRAM bus, that will be tracked by the metric, but not by
that CUPTI operation

Thanks for the answer! So for example if I copy two vectors of 1000 floats and bring one back using cudaMemcpy I will move 1000x3x4 bytes = 12’000 bytes.
Instead If on the GPU after moving that I allocate and use two other vectors of 1000 floats, just temporary (I will not move them back to the main memory) let’s say, the data movement read by (dram_write_transaction + dram_read_transaction)*32 will be 1000x5x4 bytes = 20’000 bytes?

I’m not sure what that means. “Allocate two other vectors” on the GPU? Normally I expect allocations be done with e.g. cudaMalloc.

I don’t use CUPTI much. You can ask detailed profiler questions on the forums dedicated to them.

This question may be of interest for general understanding of what the profiler metrics (e.g. nvprof) track.

Thanks, I found the link really useful. I am also wondering if there is a similar command to “–track-memory-allocations” too see the memory allocated with CudaMalloc but using ncu from command line instead of nvprof. Any chance? Thanks

probably best to ask that on the profiler forum. I would recommend checking nsight-systems (nsys) not ncu.