Hi, chijen
Sorry for feedback late.
I’ve found the reason of this issue. The kernel function (A) mentioned in my post is executed in a sub-thread. However there is another kernel function (B) executed in the main thread. After I removed B, there’s no unexpected long the execution time for A. So considering the priority of thread schedule, I think all of kernel functons shall be implemented in sub-threads.
BTW, if there’s any other factor that may affect the kernel function execution, please let me know.