TensorRT 6.0 has been released with some fixes for multi-threading multiple engines. Please try the newest release and let us know if the issue has been resolved.
The expected behavior by application however is one ExecutionContext per cpu thread.
TensorRT 6.0 has been released with some fixes for multi-threading multiple engines. Please try the newest release and let us know if the issue has been resolved.
The expected behavior by application however is one ExecutionContext per cpu thread.
It seems that in the multi-threading some layers in one engine (especially convolution) are blocking other engine. Is it something related to MPS or something else?
Can you provide the tactic IDâs of the layers you believe are misbehaving from the Verbose log? We can have an engineer look into this in more detail.
Hi I just profiled with nvvp and saw the phenomenon. This happens for almost all convolutions.
Here is a more detail of my program:
BTW, I am using TensorRT 5.1.5. I donât know how to get tactic ID from nvvp.
There are issues that are known to have occurred in 5.1 that fixed in 6.0, we just need to verify that you are no longer seeing those issues, otherwise we need to investigate the specific problem you are seeing. You can get the information by adding --verbose --exportTimes=timing.txt --exportProfile=profile.txt > verbose_output.txt
to your trtexec command with TensorRT 6.0.
I attach the the screenshot of the profiling result for processes running with TensorRT5 and TensorRT6. Each row is the process executed by a thread. You can see that there are gaps between kernel functions in each thread. It wouldnât happen if I just run one thread at a time. Regarding exporting the profiling, I donât know how to use the arguments you provides. When I add the flags, it shows âunknown command line flag âexportTimesââ and ERROR: unknown command line flag âverboseâ.
(Somehow I canât upload image, so links are provided below)
Hello,
engineering ran tests of an engine consisting of a single simpleTopK kernel. Two instances of the engine are being run, each on a different thread.
High occupancy â Low amount of overlap
When the SM occupancy is high, the amount of overlap between kernels is low. The images [nvprof_b256_total_duration.JPG], show that out of 24.158ms only 58us are overlapped. This means only 0.24% of execution is overlapped.
The overlap basically happens only in the tail of the kernel execution.
Looking back at the visual profiler images, we see that the kernel has a gridsize of 131,072 thread blocks. During execution, every SM will take n thread blocks in a wave.
number_of_waves * kernel_wave_time = total_time
ceil(grid_size/(SMs * n)) * kernel_wave_time = total_time
ceil(131,072/(80*n)) * kernel_wave_time = total_time
(kernel_wave_time = 0.058 and total_time=24.158)
<=> ceil(131,072/(80*n)) = 24.158/0.058 =416.5
==> n = 4
This means we have 131072/(4*80) = 409.6 waves, or 409 full waves and 1 tail, where the overlap happens.
Low occupancy â High amount of overlap
Extreme case
The next example shows the extreme case where the SMs are extremely under-occupied. The overlap can happen anywhere, depending on whether other dependencies are met to launch the kernel (e.g., after the necessary HâD memcopies) see nvprof_b1_overlap.JPG
Partial overlap
In the image below we can see that the grid size is 512. Using n=4 as calculated in the first example, and #SMs=80, we can compute that there are 1.6 waves.
This means we can expect overlap halfway the execution. Indeed, calculating the overlap percentage we see 67us of overlap for a total of 136us kernel execution time, which is a 50% ratio.
(Note further, that it can be observed from the image that the second kernel seems to take a little longer. This is simply due to the fact that during the overlap period, the second kernel could only use 40% of SMs).
see nvprof_b16_total_duration.JPG, nvprof_b16_overlap nvprof_b16_second kernelExternal Media
External MediaExternal MediaExternal Medias://a70ad2d16996820e6285-3c315462976343d903d5b3a03b69072d.ssl.cf2.rackcdn.com/f5c34ff7b6888a5ce9c653dcd0ae9819[/img]