Why the kernel launch time RTX3090 more than RTX2080 50%

I used BOTH RTX3090 and RTX2080 to train tensorflow model. Through Tensorboard, I found that the kernel launch time of RTX3090 was much longer than that of RTX2080. What is the reason?