Multi threading have lock problem

ho126jin · November 22, 2021, 12:25pm

I found that there was a lock between the multi-threads while launching the kernel to the GPU’s execution engine queue.

However, in the figure, while waiting for “max_pool_forward_nchw” and “generateWinogradTilesKernel” to obtain a lock, the checked kernel (sky blue kernel) first obtained the lock and completed the launch.

How is this possible?

Doesn’t the mutex queue follow the FIFO?

Robert_Crovella · November 22, 2021, 2:45pm

I don’t think details of kernel launch processing are specified anywhere.

Wouldn’t that be a bad idea if you wanted to enable out-of-order processing such as what you might have with CUDA streams?

ho126jin · November 22, 2021, 4:17pm

I am using RTX Quadro 6000 and using multithreading to run Torchvision’s Densenet201, Resnet152, Alexnet, and vgg16 simultaneously on each thread.

In addition, that figure is at some point when my application is profiled through the Nsight system.

Only kernel launcher’s process is shown in the figure.

Regardless of kernel launch detail, only one kernel can be launched at a time per device.

In multithreading ,if one kernel is launching and another kernel requests launch, the kernel has to wait until it gets ‘mutex lock’.

Am I right?

What I’m curious about is the order of obtaining the ‘mutex lock’.

On the profiler, the longest-awaited kernel does not seem to be the first to get the ‘mutex lock’, but rather randomly gets the ‘mutex lock’ and launches the kernel.

Isn’t this the first-in-first-out?

Robert_Crovella · November 22, 2021, 4:41pm

That looks like “kernel launch detail” to me. I’m sure you can study and reverse-engineer some of it if you want. I’m not able to comment on it directly. As far as I know it is not specified.

If you observe that the longest waiting kernel somehow is not the next one to be processed, doesn’t that mean that FIFO is not a very good description of the process?

ho126jin · November 23, 2021, 5:54am

If I use MPS than multithreading, can I exclude these delays?