Deepstream parallel inference query

I have two tensorrt engines whose kernels all have 100% gpu occupancy, now i want them to run in parallel will the deepstream parallel inference implement some resource sharing to make kernel execution parallel or will it make the engines run concurrently (i.e., one kernel after other)

Yes. It is TensorRT who makes the two engines work in parallel.

When i use these two engines in separate thread with separate streams and memory buffers i get no parallelism, since tensorrt is the one making things run in parallel any thing I’m missing to make them share resources a i see 100% occupancy for all kernels. Thank you

What does this mean?

Here if i have understood correctly in the deepstream parallel inference example, the multiple engines run parallely are handled by tensorrt i.e., kernel level parallel execution is ensured, but when using tensorrt api directly this parallelism doesnt happen

DeepStream also uses TensorRT APIs.

So i have to find what part of tensorrt does the resource allocation to ensure these engine’s kernels are executed in parallel in deepstream to get the same effect in direct tensorrt api usage?

The DeepStream parallel sample only helps to batch the videos. The inferencing parts are implemented with TensorRT APIs. Nothing special.

Oh ok thanks for this will keep this in mind, im closing with assumption that even with the deepstream example wouldnt ensure kernel level parallel excution of the engines if wven resources arent available just as in normal tensorrt execution

Yes. You are right.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.