Latency Gains using TensorRT

Recently I am trying N:M pruning methods to see the gains in models such as Torchvision ResNet50 or ViT using TensorRT python library. Currently the setting I have -

GPU - A100/H100

Batch Size - 128/256

Pruning Method - ASP

Using Int8/FP16 with calibration data as ImageNet1k

After using this the gains that I am getting for - Sparse_TRT/Dense_TRT is -

For ResNet50

FINAL COMPARISON (mean latency, TRT unless --skip-tensorrt)

Method Mean (ms) p99 (ms) Throughput Speedup vs dense

Dense 36.787 37.120 6959.0 1.000x
ASP 2:4 36.520 36.790 7009.8 1.007x

For ViT-B/16

FINAL COMPARISON (mean latency, TRT unless --skip-tensorrt)

Method Mean (ms) p99 (ms) Throughput Speedup vs dense

Dense 66.534 67.475 3847.6 1.000x
ASP 2:4 55.680 56.360 4597.7 1.195x

I am not sure if these numbers are expected like this or they should be more higher. I have also tried normal N:M sparsity just to see if there is any latency gains and the numbers are almost similar.

Please help me to understand if these make sense or not

Regards