Cuda kernel function takes about the same amount of time on Orin and Xavier

I exexute same CUDA kenel function on Orin and Xavier, but the time spent on two platform is similar. Who knows what the possible reason is?