I have two kernels (A and B) that can be executed concurrently. I need kernel A to finish as soon as possible.
Is it possible to execute A and B concurrently with A having higher priority on Tegra X1 platform?
I want to make sure the one with the highest priority could always run first.
We have tried to use APIs:
but it seems that priority setting is unsupported on TX1, both highest priority and lowest priority returned by cudaDeviceGetStreamPriorityRange is 0.
If StreamPriorities is indeed not supported on ARM, is there any other efficient way to guarantee the highest priority of kernel A? And these two kernel should be executed concurrently, because they are launched in no order.\