Multiple kernels running concurrently

Hi,

I wanted to know what is the expected behavior on Orin AGX when two different CPU processes launch kernels to the same stream (default). Is it possible for these to run in parallel on the GPU (space sharing)?

From the profiling results that Nsight systems show, they seem to be running simultaneously.

However, as per my understanding, they are supposed to be run in a time-shared fashion. Could you please help me understand this better?

Hi,

GPU resource for two processes is expected to be time-sharing.
Could you share a reproducible source so we can check this with our internal team?

Thanks.

Sure, thank you. The scripts we use are here: conc_folder_from_orin

One of them is a DNN training workload (MobileNetv3) and the other is an inference workload (ResNet50). Both of them are run using a wrapper script and profiled using Nsight Systems.

Instructions:

  1. Download datasets (Was unable to upload these because of the size)
    GLD23k : GitHub - cvdfoundation/google-landmark: Dataset with 5 million images depicting human-made and natural landmarks spanning 200 thousand classes.
    ImageNet: ImageNet Object Localization Challenge | Kaggle

  2. Replace dataset path in exp_script.sh

  3. Run the script: bash exp_script.sh

Hi,

Thanks for sharing the code.

We will try to reproduce this internally and check with our internal team for details.
Will let you know the following.

Thanks.

Hi,

We also observe the similar behavior in our environment.
Need to check with our internal team. Will get back to you later.

Thanks.

Thanks for trying it. By similar behavior, do you mean kernels from both processes running simultaneously?
Is there a way to confirm which SMs the kernels are running on using NSight?

Hi, any updates on this?

Hi,

Sorry that we are still checking the details with our internal team.
Will share more info with you later.

Thanks.

Hi,

All the active CUDA contexts use the GPU in a time-sharing manner.

Our guess is that the resolution of the NSight Tool is bigger than the resolution at which the time sharing is happening.
So the tool gives the impression that everything is running in parallel.

Thanks.

Thanks for the clarification. Is there any detailed documentation on the GPU time sharing? For instance, how long is the default time slice for a process etc.

Hi,

Sorry that GPU low-level info is not publicly shared.
Thanks

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.