I wanted to know what is the expected behavior on Orin AGX when two different CPU processes launch kernels to the same stream (default). Is it possible for these to run in parallel on the GPU (space sharing)?
From the profiling results that Nsight systems show, they seem to be running simultaneously.
One of them is a DNN training workload (MobileNetv3) and the other is an inference workload (ResNet50). Both of them are run using a wrapper script and profiled using Nsight Systems.
Thanks for trying it. By similar behavior, do you mean kernels from both processes running simultaneously?
Is there a way to confirm which SMs the kernels are running on using NSight?
All the active CUDA contexts use the GPU in a time-sharing manner.
Our guess is that the resolution of the NSight Tool is bigger than the resolution at which the time sharing is happening.
So the tool gives the impression that everything is running in parallel.
Thanks for the clarification. Is there any detailed documentation on the GPU time sharing? For instance, how long is the default time slice for a process etc.