CUPTI initalization and CUDA API calls

On the CUPTI documentation, it’s mentioned that “CUPTI initialization occurs lazily the first time you invoke any CUPTI function. For the Activity, Event, Metric, and Callback APIs there are no requirements on when this initialization must occur (i.e. you can invoke the first CUPTI function at any point).”

However,

I noticed that whenever we have CUDA APIs before CUPTI at least for PCIe and NVLink activity kinds, CUPTI activities don’t work. I tried to enable activity types, registering the callbacks right after cudaGetDeviceCount and call cuptiActivityFlushAll but it’s giving me any activity records.

According to the documentation, shouldn’t it work even before CUDA API calls? I think this is a bug!

I am using CUPTI in 10.1.105 release and Driver Version: 418.39 and CUDA Version: 10.1

cuda_err = cudaGetDeviceCount(&dev_count);
CUPTI_CALL(cuptiActivityEnable(CUPTI_ACTIVITY_KIND_NVLINK));
CUPTI_CALL(cuptiActivityEnable(CUPTI_ACTIVITY_KIND_DEVICE));
CUPTI_CALL(cuptiActivityRegisterCallbacks(bufferRequested, bufferCompleted));
//some stuff
CUPTI_CALL(cuptiActivityFlushAll(0));

this will result in empty buffer

It appears to be a bug in the CUPTI. We will evaluate if this can be fixed. Thanks for reporting it.

In CUDA 11.7 release, a new API cuptiActivityEnableAndDump is introduced to provide snapshot of certain activities like device, context, stream, NVLINK and PCIE at any point during the profiling session.
For NVLINK and PCIE records, user can call cuptiActivityEnableAndDump(CUPTI_ACTIVITY_KIND_NVLINK) and cuptiActivityEnableAndDump(CUPTI_ACTIVITY_KIND_PCIE) respectively to retrieve records at any point after CUDA initialization.