pytorch JIT vs tensorRT

pytorch JIT also claims to optimize CUDA kernels by batching smaller ones into larger ones. Has anybody done comparison between the increase of throughput from pytorch → jit and pytorch → tensorRT?

I have the same question.
Any one knows?