I’m dealing with CUPTI memory overheads, and have spotted a memory usage increase when CUPTI is enabled (for callback APIs in my case). It seems that this memory usage also scales with the number of CPU cores.
Part of the memory reservation happens during
Is there a way to avoid this memory reservation or specify the number of cores that the process under profiling executes on? I have a process running only on a subset of CPUs. For this process, it would be unnecessary to allocate buffers for all the CPU cores.