Problem with profiling my program with dynamic parallelism on Ampere architecture

hsa · January 26, 2022, 7:20am

Hi

I’m trying to profile my program on a system with an Ampere GPU, but I can’t get any kernel timing out. From reading up on this, it seems the problem is when using an architecture with c.c. >= 7.0.

Is there any way to get around this? I don’t care about the dynamic bit, the total time of my kernel and the overview is what I’m after.
Or, is this a known topic and something that is being fixed?

BW
Henrik

Robert_Crovella · January 26, 2022, 3:02pm

which profiler?

hsa · January 26, 2022, 7:27pm

I’m using the NSight Systems 2022.1.1, but the 2021.3.3 also fails.

I compile the code using Cuda 9.2.148.

Robert_Crovella · January 26, 2022, 7:36pm

The recommended CUDA version for Ampere is 11.0 or newer (for cc8.0, 11.1 or newer for cc8.6)

I’m not aware of any current limitations on CDP profiling in Nsight systems, but it’s possible there may be. You might want to ask your question on the nsight systems forum.