nvvp too slow for 80mb dataset

I ran nvprof for a 60 second run of my program and it returned a 80mb dataset.

It takes about ~4 minutes to open this dataset, and the interface is not unresponsive during this time. Opening a dataset corresponding to my entire run is impossible.

My machine has 8 physical Intel cores (dual socket), but my CPU usage never exceeds 200 . This means that only 2 PEs are being utilized.

Is there a way or plans to make nvvp faster, possibly making it more parallel, or not loading all the data at once?

Take a look at the “Preparing An Application For Profiling” section of the Profiler User’s Guide at docs.nvidia.com. It discusses using cudaProfilerStart() and cudaProfilerStop() to limit the region of your application over which profiling is performed. This will make it easier for both you and the tool to process the resulting profile information.