Total device memory allocated in an application.

Is there a utility that we can use to find out the total amount of device memory allocated in an application?

nvidia-smi will report the current amount of device memory allocated, on a sampled basis.

nvprof has a –track-memory-allocations option, I’m not sure if it will meet your needs:

Thankyou Robert. I am working with micro kernels so execution time is quite short, so nvidia-smi is not suitable for my test case. I am using an older version of tool kit which does not support --track-memory-allocations option with nvprof.

However I have tried --track-memory-allocations option with CUDA 9.0 and it doen’t seem to alter the output of nvprof. Any idea what might be the problem.

Sorry, this may not have been a very useful suggestion. That option tracks memory allocations, but nvprof does not display the data. The data can be extracted from a profiler output file.

Since the visual profiler can import these nvprof output files, one way to use it is to import the data into the visual profiler, however the visual profiler only uses/displays allocation data pertaining to unified memory allocations.

To see all the allocation data, its necessary to use a sqlite tool (e.g. sqlite browser) to view the records directly:

As a simplistic example, create a cuda code that does some device memory allocation (e.g. cudaMalloc). Compile and run that code under nvprof like:

nvprof --track-memory-allocations on -f -o test.profout  ./my_cuda_executable

After that, for a quick view, drop the test.profout into the drag-and-drop box on this sqlite online browser web page:

There will then be a drop-down box that allows you to select from different record categories in the database. Select CUPTI_ACTIVITY_KIND_MEMORY

You might then see output like this (looking at just one row):

_id_	memoryKind	address	bytes	start	end	allocPC	freePC	processId	deviceId	contextId	name
2	3	139721858613248	32	1568671913508821800	1568671913513367000	4207042	4207134	13890	0	1	0

memoryKind 3 refers to ordinary cudaMalloc device memory (see here: The next field is the address of the allocation, the total bytes allocated (32 in this case), the starting (cudaMalloc) timestamp in nanoseconds, the ending (cudaFree) timestamp in nanoseconds, the program counter corresponding to the allocation (cudaMalloc) the program counter corresponding to the free (cudaFree) the process ID (13890), and the device ID and the context ID.

Yes, its not a very polished interface.

Thanks I will look into it.