CUDA menory activity failed to detect cudaFree

Hi,

I’m currently using CUPTI_ACTIVITY_KIND_MEMORY2 to detect memory operations in the application like cudaMalloc and 'cudaFree. However, when I profile a very simple pytorch application, the callback API shows that there are some cudaFreecalls but I never get aCUpti_ActivityMemory3` with CUPTI_ACTIVITY_MEMORY_OPERATION_TYPE_RELEASE. Why is that?

To add more context here, when I profile a cuda program that explicily called cudaFree, it successfully detect a CUPTI_ACTIVITY_KIND_MEMORY2 activity with CUPTI_ACTIVITY_MEMORY_OPERATION_TYPE_RELEASE.

Best

Hi, @frankchen8508

Can you provide us a simple repro for better analysis ?

Thank you for your response. We’ve taken a deeper look into the issue, and it turns out that PyTorch uses cudaFree on a nullptr, which appears to be a no-op. I found that cudaFree(nullptr) is called within a function named initializeCudaContext.

My question is:

Does cudaFree(nullptr) actually initialize the CUDA context?

If so, what is the difference between using cudaFree(nullptr) and cuCtxCreate?

Thanks in advance!

Hi @frankchen8508,

cudaFree(nullptr) or cudaFree(0) is commonly used to initialize the primary CUDA context, usually as the first CUDA API call in the user application. It does not free any memory, which is why we don’t see a CUpti_ActivityMemory3 record associated with it.

cuCtxCreate is a driver API which is used to create a new CUDA context and associate the created context with the calling thread. It also takes in additional flag parameters which the user can use to specify the type of context as well as the amount of resources allocated to it. Note that to use driver APIs, one must initialize the driver API with cuInit(), or alternately any CUDA Runtime API call (cudaFree(nullptr) is typically used for this purpose as it will perform this implicit initialization of the CUDA context).