Support for MPS

nunduniel · December 5, 2021, 5:32pm

Hi,
I used to use Nvidia Visual profiler to profile 3 applications started under MPS server to overlap execution.
On it’s CLI it had —profile-all-processes and would output a report per PID and let me import all 3 in same timeline.

Now Nvidia Visual Profiler has been depreciated and doesn’t support Ampere (SM80+).

The transition guide shoes that Nsight system was supposed to support MPS, but doesn’t say how:

But under Linux X86 I see no options to do so. I can’t even profile all processes on the system (only Tegra).
All I can do is launch a single process and profile that.

Does anyone know how to profile multiple apps under MPS?
Would we do it with the brand-new feature of importing multiple reports in the same timeline of 2021.5?

Thanks for any help

hwilper · December 6, 2021, 4:46pm

Simplest way would probably be to launch them all out of a script (or launch MPS out of a script) and profile the script.

Alternatively you could, as you suggested, profile them separately and then combine them using the multiple reports, but I think profiling the script (and therefore the process tree containing all of them) would be your best bet.

nunduniel · December 6, 2021, 4:51pm

Does the process tree approach work as well, if I have docker containers for the other applications?
I would think it looks like separate devices, and they wouldn’t be in the same process tree.

I have my main app in a process, and the consuming applications in oher processes with other docker containers (one for all the encoding dependencies, the other for AI / TensorRT).

The adding multiple-report worked (I do regular graphical profile of main app, and command line in docker container for other apps, and add he other apps report to main app, but it’s very manual and cumbersome.

hwilper · December 6, 2021, 4:55pm

@rknight can you comment on this?

rknight · December 6, 2021, 7:30pm

What data do you want to collect in this profile - i.e. are you trying to collect cuda trace data, cpu sampling data, etc?

nunduniel · December 6, 2021, 9:01pm

CUDA traces.
I am trying to see the amount of overlap I get with MPS.
And when it cannot perform the applications in parallel.

rknight · December 6, 2021, 9:50pm

Nsight Systems CUDA tracing works by injecting an application with additional libraries at process startup. When a process is injected, any children process(es) it might launch are also injected. This injection mechanism will not work when a docker container contains a child process.

system · December 20, 2021, 9:51pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.