Hi All,
I’m trying to run the visual profiler for a cuda workload on a windows multi gpu system (4 Titan V). My problem is that nvvp is not showing the compute streams on 3 of the Titan V’s, only the one that owns the d3d device shows up. and there are gaps on the cpu side runtime api timeline where the calls going to the missing devices are supposed to be.
The behavior is the same regardless of what mode the drivers are in (TCC or WDDM), and i made sure to call cudaDeviceSynchronize + cudaProfileStop before exiting the profile session.
The missing kernel launches are a mix of my own and cudnn’s.
Any idea what could be causing this?