I create several stream with priority, and then infer a model in DLA. But it doesn’t execute with expected sequence. How can I set the priority with DLA operators to ensure that a certain inference is completed first?
Hi,
Could you share more about your use case?
For example, two application runs on DLA0 and DLA1.
You want the DLA0 with the higher priority that finishes first.
Or two application runs on DLA and GPU.
Do you want the DLA jobs finished before the GPU task?
Thanks.
like two application runs on DLA, i want the first application finished and then another application run.
like this picture,(there are four applications)
It runs app1 layer 1 , app 2 layer 1 and then app 1 layer 2, app2 layer 2.
I want it runs app 1 layer 1, app 1 layer 2 and then app 2 layer 1, app 2 layer 2.
I also want to ask that why my subgraph running on DLA is split to four parts? Since there are no GPUFallback, it should be one layer running on DLA.
Hi,
If you want to run two applications sequentially, please attach them to the same CUDA stream.
Based on the screenshot, it seems lots of streams are used.
Thanks.
yeah, we have 4 apps and also 4 streams.
But, every app has their stream.
What I want to say is that even in one stream on DLA, the infer is interrupt by another stream. I don’t want this.
Hi,
You will need to use the same cuda stream and submit the task in order to avoid this.
Tasks attached to the different streams can be executed in parallel.
Thanks.
But in one stream, there is no priority.
How can I set priority between different streams which can affect task on DLA?
Even I set stream priority, I found it doesn’t seem to work on DLA.
Hi,
The task pushed to the same stream will execute in order.
You don’t need to set the priority just launch the kernel with the order you want.
Priority is not a hard limitation.
It isn’t guaranteed that the high priority will always be finished first.
Thanks.
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.