Priority of concurrent CUDA kernel execution on TX1

I have two kernels (A and B) that can be executed concurrently. I need kernel A to finish as soon as possible.

Is it possible to execute A and B concurrently with A having higher priority on Tegra X1 platform?
I want to make sure the one with the highest priority could always run first.

We have tried to use APIs:

cudaStreamCreateWithPriority(..., priority)

and

cudaDeviceGetStreamPriorityRange

but it seems that priority setting is unsupported on TX1, both highest priority and lowest priority returned by cudaDeviceGetStreamPriorityRange is 0.

If StreamPriorities is indeed not supported on ARM, is there any other efficient way to guarantee the highest priority of kernel A? And these two kernel should be executed concurrently, because they are launched in no order.\

Thanks!

Hi rownine, thank you for reporting this, we are currently investigating this issue.

Hi rownine,

The Stream Priorities functions are not supported in current CUDA v7.0 with TX1, it’s a new feature added from CUDA v7.5.

We’re planning to support a newer version of CUDA toolkit with this feature at coming TX1 new release.

Once any clear info, I will update to you.

Thanks

Does the new CUDA v8.0 with TX1 support Stream Priorities functions ?

Yes, it should, please refer the below section for the usage:
http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#stream-priorities

Thanks