pytorch JIT vs tensorRT

pytorch JIT also claims to optimize CUDA kernels by batching smaller ones into larger ones. Has anybody done comparison between the increase of throughput from pytorch -> jit and pytorch -> tensorRT?

I have the same question.
Any one knows?