I want to concurrently and independently of each other run and inference 3 networks. 1 on DLA0, 1 on DLA1 and the third on the GPU, using the python on Jetson Xavier NX.
It would be appreciated if anyone let me know the possibility and implementation of that and help and give me some sample codes.
DLA is a hardware-based accelerator so it has some constraints in the layers:
You can find a Python inference sample below:
Setting DeviceType to DLA can create a DLA engine:
thank you for your response.
In TRTInference class, when I make an object to execute my_model.engin inference on DLA, does it matter what device I have configured for Device in this line:
“self.cfx = cuda.Device(0).make_context()”
(this line is in TRTInference class not multi-thread)
If yes, how can I config that? I used “dla:0” in cuda.Device and got an error about the argument being wrong.
On the other hand, I think that since I built and save the engine to run on DLA, I don’t need to make other settings when I execute inference on DLA.
I am completely confused. :(
I have another question about multi-threading code, when I use thread1 and thread2, after calling start and join, will both threads be executed at the same time?
You can create a DLA engine with trtexec command and run it with Python.
In general, the following configure need to be set:
config.dla_core = 0 config.set_flag(trt.BuilderFlag.GPU_FALLBACK)
Below is our DLA tutorial for your reference:
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.