DLA1 and DLA0 not run in parallel, but in sequence

TensorRT Version: 8.6
Platform: Orin

In my process, DLA is used in Hybrid Mode.
when I try to run ModelA in DLA0 and run ModelB in DLA1, the modelB will run until modelA finish proceeding, even if I have setDLACore(0) for ModelA and setDLACore(0) for ModelB. Export Cuda_Device_Connection = 32 can’t fix this problem. Is there any tips to debug?