Track memory allocation per kernel

HPC725 · November 9, 2022, 10:20am

Hi, using this answer on the forum about reading the bytes of memory allocations using nvprof (my GPU does not support ncu) I can successfully read the bytes moved to the GPU using CudaMemCpy by reading the values in the table ‘CUPTI_ACTIVITY_KIND_MEMCPY’ and performing the analysis with the “–track-memory-allocations on” option.
I therefore wonder if it is possible to read these metrics but only related to a specific kernel, like with the “–kernels name” analysis for other type of events.
Could you explain to me what the difference is between reading the bytes allocated by the CudaMemCpy obtained by the CUPTI_ACTIVITY_KIND_MEMCPY table and reading the metrics (dram_write_transaction + dram_read_transaction)*32?

Thanks.

Robert_Crovella · November 9, 2022, 2:23pm

a cudaMemcpy operation doesn’t have an inherent association to a kernel

bytes are not allocated by cudaMemcpy. The difference is that one tracks bytes moved by memcpy operations, and the other tracks bytes read or written by the GPU. For example, when a CUDA kernel is reading and writing data by executing CUDA C++ device code, if that causes movement of data across the DRAM bus, that will be tracked by the metric, but not by
that CUPTI operation

HPC725 · November 9, 2022, 2:33pm

Thanks for the answer! So for example if I copy two vectors of 1000 floats and bring one back using cudaMemcpy I will move 1000x3x4 bytes = 12’000 bytes.
Instead If on the GPU after moving that I allocate and use two other vectors of 1000 floats, just temporary (I will not move them back to the main memory) let’s say, the data movement read by (dram_write_transaction + dram_read_transaction)*32 will be 1000x5x4 bytes = 20’000 bytes?

Robert_Crovella · November 9, 2022, 2:48pm

I’m not sure what that means. “Allocate two other vectors” on the GPU? Normally I expect allocations be done with e.g. cudaMalloc.

I don’t use CUPTI much. You can ask detailed profiler questions on the forums dedicated to them.

This question may be of interest for general understanding of what the profiler metrics (e.g. nvprof) track.

HPC725 · November 17, 2022, 11:29am

Thanks, I found the link really useful. I am also wondering if there is a similar command to “–track-memory-allocations” too see the memory allocated with CudaMalloc but using ncu from command line instead of nvprof. Any chance? Thanks

Robert_Crovella · November 17, 2022, 9:10pm

probably best to ask that on the profiler forum. I would recommend checking nsight-systems (nsys) not ncu.

Topic		Replies	Views
Track CudaMalloc allocations Profiling Linux Targets	4	1141	December 1, 2022
Total device memory allocated in an application. CUDA Programming and Performance	4	2569	September 17, 2019
Question about memory transfer Visual Profiler and nvprof	2	1685	February 5, 2020
Global memory usage profiling and tracking Visual Profiler and nvprof cuda , profiling	9	2100	February 1, 2024
How to get the bytes read/write sum about Memory access between GPUs? Nsight Compute	7	1071	March 20, 2024
Profiling CUDA memory consumption CUDA Programming and Performance	0	551	July 29, 2020
I can't understand the behavior of CUPTI CUPTI – CUDA Profiler Tools Interface	0	150	July 14, 2025
CUDA memory performance Jetson TK1	3	1224	October 18, 2021
CUPTI_ACTIVITY_KIND_MEMCPY include non-memory copy API calls Profiling Linux Targets cuda	7	601	January 4, 2024
How to view track memory allocations output Visual Profiler and nvprof	1	1477	July 8, 2019

Track memory allocation per kernel

Related topics