How to use both DLA engines at the same time


In the Release Notes of JetPack 4.2, they mentioned that: “Note that performance from running two Deep Learning Accelerators at the same time sums up as expected”.

When I checked TensorRT documentation, there is only a function setDLACore that takes a boolean for the engine ID, so I don’t see how can I use both DLA engines at the same time, is there a function that is not mentioned in the documentation?

Edit: I know we can assign different DLA engines to different layers, but that won’t make them run in parallel AFAIK.


You would need to create two ICudaEngine instances, one for each DLA. Here is psuedocode:

nvinfer1::IRuntime* runtime = nvinfer1::createInferRuntime(gLogger);

// create engine for DLA_0
nvinfer1::ICudaEngine* engine_DLA_0 = infer->deserializeCudaEngine(...);

// create engine for DLA_1
nvinfer1::ICudaEngine* engine_DLA_1 = infer->deserializeCudaEngine(...);

Thank you for your answer, in my understanding, creating two engines allow to use the two DLAs for different processing, by mapping each layer to a specific DLA engine, but if we have a classical CNN where every stage has one type of layer (let’s say convolution), then the two DLA engines won’t parallelize the processing of the layer, and the layer will be processed in only one DLA engine.

Can we parallelize the same layer between the two DLA engines, where each engine does half of the convolutions of the same layer? Is there a lower level control of the DLA engine?
And will there be in the future releases an automatic and efficient mapping and synchronization of the same layer on both DLAs, like what happens between GPU cores?


If you can’t figure out how to use both cores for a single workload, at least you can pipeline them (double-buffering style,) so that your throughput doubles, even if your latency stays the same.