Help understand the default value of cutpi activity attr device buffer pool limit

Hi I see that default value for CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_POOL_LIMIT is 100. I see that CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_SIZE is 8 MB.
Does this mean, upto 800 MB worth of profiling data can be stored in the device before it could be flushed back to the user, unless explicitly requested?

Is there a more detailed documentation explaining these, especially the relation between all the attributes that can be used to control cupti.

Thank you,
Sujan

Hi Sujan,

During the CUDA context creation time, CUPTI allocates a single buffer of size CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_SIZE. By default it’s set to 8 MB which can hold tracing information for ~0.25 M kernels in the concurrent kernel mode. This attribute is configurable and user can choose any value based on the requirement. CUPTI doesn’t allocate more buffers unless it’s required. Once device buffer is exhausted, CUPTI allocates another device buffer of the same size. Note that memory footprint will not scale with the kernel count because CUPTI reuses the buffer after processing all the records in the buffer.

In general, activity buffer flush should be independent of the device buffer size, but due to an optimization it has some dependency on the buffer size CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_SIZE, but it’s independent of the pool limit CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_POOL_LIMIT. This behavior will be improved in a future CUDA release, and we’d decouple flushing from the device buffer size. Activity buffers will be delivered as soon as those are ready to be consumed.

Refer Memory Overhead section of the CUPTI guide https://docs.nvidia.com/cupti/Cupti/r_main.html#unique_1148016283

Thank you ,

I had not looked at this Memory overhead guide. This is very informative.