Converting int4 model to trt engine for inferencing

Hi,

I am using Nvidia model optimizer ( https://nvidia.github.io/Model-Optimizer/guides/\_pytorch_quantization.html ) and converting Yolov7 official object detection model to below quantizations -

AVAILABLE_CONFIGS = {

    "int8_default": mtq.INT8_DEFAULT_CFG,

    "int8_smoothquant": mtq.INT8_SMOOTHQUANT_CFG,

    "fp8_default": mtq.FP8_DEFAULT_CFG,  # For H100 and newer GPUs

    "w4a8_awq": mtq.W4A8_AWQ_BETA_CFG,   # 4-bit weights, 8-bit activations

}

The problem is I am able to convert and the accuracy is also pretty fine (not sure whether quantization is working fine or not). When I do the inferencing, the int4 model is running very slow.

Here is my code

model_q = mtq.quantize(model_c, config, forward_loop)
checkpoint = {
        'state_dict': model_q.state_dict(),
    }
    model_name = 'weights/yolov7_' + config_name + '.pth'
    torch.save(checkpoint, model_name)

How do I convert the int4 model to trt engine format so that I can run it faster?

Also, what does this mean?

================================================================================
Testing quantization config: w4a8_awq
Config details: {'quant_cfg': {'*weight_quantizer': [{'num_bits': 4, 'block_sizes': {-1: 128, 'type': 'static'}, 'enable': True}, {'num_bits': (4, 3), 'axis': None, 'enable': True}], '*input_quantizer': {'num_bits': (4, 3), 'axis': None, 'enable': True}, 'nn.BatchNorm1d': {'*': {'enable': False}}, 'nn.BatchNorm2d': {'*': {'enable': False}}, 'nn.BatchNorm3d': {'*': {'enable': False}}, 'nn.LeakyReLU': {'*': {'enable': False}}, '*lm_head*': {'enable': False}, '*proj_out.*': {'enable': False}, '*block_sparse_moe.gate*': {'enable': False}, '*router*': {'enable': False}, '*mlp.gate.*': {'enable': False}, '*mlp.shared_expert_gate.*': {'enable': False}, '*output_layer*': {'enable': False}, 'output.*': {'enable': False}, 'default': {'enable': False}}, 'algorithm': 'awq_lite'}
================================================================================

Thanks

Hi,

You can convert a TensorRT engine with INT4 with trtexec directly:

$ /usr/src/tensorrt/bin/trtexec --help
&&&& RUNNING TensorRT.trtexec [TensorRT v101303] [b9] # /usr/src/tensorrt/bin/trtexec --help
...
  --noTF32                           Disable tf32 precision (default is to enable tf32, in addition to fp32)
  --fp16                             Enable fp16 precision, in addition to fp32 (default = disabled)
  --bf16                             Enable bf16 precision, in addition to fp32 (default = disabled)
  --int8                             Enable int8 precision, in addition to fp32 (default = disabled)
  --fp8                              Enable fp8 precision, in addition to fp32 (default = disabled)
  --int4                             Enable int4 precision, in addition to fp32 (default = disabled)
  --best                             Enable all precisions to achieve the best performance (default = disabled)
...

Thanks.

Hi, this is the error I am getting while exporting int4 model to onnx before converting to trt engine.

Failed to export ONNX for w4a8_awq: Quantizer has not been calibrated. ONNX export requires the quantizer to be calibrated. Calibrate and load amax before exporting to ONNX

code snippet-

model_q = mtq.quantize(model_c, config, forward_loop)
    checkpoint = {
        'state_dict': model_q.state_dict(),
        # 'config_name': config_name,
        # 'quantization_config': str(config)
    }

    # Save PyTorch checkpoint
    model_name = 'weights/yolov7_' + config_name + '.pth'
    torch.save(checkpoint, model_name)
    print(f"Saved quantized PyTorch model to {model_name}")
    
    # Export to ONNX - disable quantizers for ONNX export
    onnx_model_name = 'weights/yolov7_' + config_name + '.onnx'
    model_q.eval()
    
    try:
        print(f"Exporting to ONNX: {onnx_model_name}")
        
        # Temporarily disable all quantizers for ONNX export
        # The "*" wildcard disables all quantizers in the model
        # mtq.disable_quantizer(model_q, "*")
        
        torch.onnx.export(
            model_q,
            dummy_input,
            onnx_model_name,
            export_params=True,
            opset_version=17,
            do_constant_folding=True,
            input_names=['images'],
            output_names=['output'],
            dynamic_axes={
                'images': {0: 'batch_size'},
                'output': {0: 'batch_size'}
            }
        )
        
        # Re-enable all quantizers after export
        # mtq.enable_quantizer(model_q, "*")
        
        print(f"Successfully exported ONNX model to {onnx_model_name}")
        print(f"Note: ONNX export contains FP32 model structure (quantizers disabled)")
    except Exception as e:
        print(f"Failed to export ONNX for {config_name}: {str(e)}")
        print(f"Note: Some quantization configs may not support ONNX export directly")

when I use this -

mtq.disable_quantizer(model_q, "*") and mtq.enable_quantizer(model_q, "*")

I was able to export to onnx and then further to int4 trt but my inference speed is 7ms which is too slow when I compared it with yolov7 with int8 trt inference speed which is ~3ms.

Without using disable/enable quantizer, I am not able to export to onnx onlny for int4. Int8 is working fine

Hi,

Could you try to run the INT4 model via trtexec with the --dumpProfile flag and share with us?

$ /usr/src/tensorrt/bin/trtexec --loadEngine=[file]  --dumpProfile

Thanks.

Here it is-

trtexec --loadEngine=weights/yolov7_w4a8.trt  --dumpProfile
&&&& RUNNING TensorRT.trtexec [TensorRT v101302] [b6] # trtexec --loadEngine=weights/yolov7_w4a8.trt --dumpProfile
[01/14/2026-07:11:59] [I] TF32 is enabled by default. Add --noTF32 flag to further improve accuracy with some performance cost.
[01/14/2026-07:11:59] [I] === Model Options ===
[01/14/2026-07:11:59] [I] Format: *
[01/14/2026-07:11:59] [I] Model:
[01/14/2026-07:11:59] [I] Output:
[01/14/2026-07:11:59] [I]
[01/14/2026-07:11:59] [I] === System Options ===
[01/14/2026-07:11:59] [I] Device: 0
[01/14/2026-07:11:59] [I] DLACore:
[01/14/2026-07:11:59] [I] Plugins:
[01/14/2026-07:11:59] [I] setPluginsToSerialize:
[01/14/2026-07:11:59] [I] dynamicPlugins:
[01/14/2026-07:11:59] [I] ignoreParsedPluginLibs: 0
[01/14/2026-07:11:59] [I]
[01/14/2026-07:11:59] [I] === Inference Options ===
[01/14/2026-07:11:59] [I] Batch: Explicit
[01/14/2026-07:11:59] [I] Input inference shapes: model
[01/14/2026-07:11:59] [I] Iterations: 10
[01/14/2026-07:11:59] [I] Duration: 3s (+ 200ms warm up)
[01/14/2026-07:11:59] [I] Sleep time: 0ms
[01/14/2026-07:11:59] [I] Idle time: 0ms
[01/14/2026-07:11:59] [I] Inference Streams: 1
[01/14/2026-07:11:59] [I] ExposeDMA: Disabled
[01/14/2026-07:11:59] [I] Data transfers: Enabled
[01/14/2026-07:11:59] [I] Spin-wait: Disabled
[01/14/2026-07:11:59] [I] Multithreading: Disabled
[01/14/2026-07:11:59] [I] CUDA Graph: Disabled
[01/14/2026-07:11:59] [I] Separate profiling: Disabled
[01/14/2026-07:11:59] [I] Time Deserialize: Disabled
[01/14/2026-07:11:59] [I] Time Refit: Disabled
[01/14/2026-07:11:59] [I] NVTX verbosity: 0
[01/14/2026-07:11:59] [I] Persistent Cache Ratio: 0
[01/14/2026-07:11:59] [I] Optimization Profile Index: 0
[01/14/2026-07:11:59] [I] Weight Streaming Budget: 100.000000%
[01/14/2026-07:11:59] [I] Inputs:
[01/14/2026-07:11:59] [I] Debug Tensor Save Destinations:
[01/14/2026-07:11:59] [I] Dump All Debug Tensor in Formats:
[01/14/2026-07:11:59] [I] === Reporting Options ===
[01/14/2026-07:11:59] [I] Verbose: Disabled
[01/14/2026-07:11:59] [I] Averages: 10 inferences
[01/14/2026-07:11:59] [I] Percentiles: 90,95,99
[01/14/2026-07:11:59] [I] Dump refittable layers:Disabled
[01/14/2026-07:11:59] [I] Dump output: Disabled
[01/14/2026-07:11:59] [I] Profile: Enabled
[01/14/2026-07:11:59] [I] Export timing to JSON file:
[01/14/2026-07:11:59] [I] Export output to JSON file:
[01/14/2026-07:11:59] [I] Export profile to JSON file:
[01/14/2026-07:11:59] [I]
[01/14/2026-07:11:59] [I] === Device Information ===
[01/14/2026-07:11:59] [I] Available Devices:
[01/14/2026-07:11:59] [I]   Device 0: "NVIDIA Thor" UUID: GPU-a7c66ad2-6dbb-0ab8-c1a2-37ba6dba3600
[01/14/2026-07:11:59] [I] Selected Device: NVIDIA Thor
[01/14/2026-07:11:59] [I] Selected Device ID: 0
[01/14/2026-07:11:59] [I] Selected Device UUID: GPU-a7c66ad2-6dbb-0ab8-c1a2-37ba6dba3600
[01/14/2026-07:11:59] [I] Compute Capability: 11.0
[01/14/2026-07:11:59] [I] SMs: 20
[01/14/2026-07:11:59] [I] Device Global Memory: 125772 MiB
[01/14/2026-07:11:59] [I] Shared Memory per SM: 228 KiB
[01/14/2026-07:11:59] [I] Memory Bus Width: 0 bits (ECC disabled)
[01/14/2026-07:11:59] [I] Application Compute Clock Rate: 1.04858 GHz
[01/14/2026-07:11:59] [I] Application Memory Clock Rate: 0 GHz
[01/14/2026-07:11:59] [I]
[01/14/2026-07:11:59] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[01/14/2026-07:11:59] [I]
[01/14/2026-07:11:59] [I] TensorRT version: 10.13.2
[01/14/2026-07:11:59] [I] Loading standard plugins
[01/14/2026-07:11:59] [I] [TRT] Loaded engine size: 75 MiB
[01/14/2026-07:11:59] [I] Engine deserialized in 0.0303246 sec.
[01/14/2026-07:11:59] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +44, now: CPU 0, GPU 118 (MiB)
[01/14/2026-07:11:59] [I] Setting persistentCacheLimit to 0 bytes.
[01/14/2026-07:11:59] [I] Created execution context with device memory size: 43.75 MiB
[01/14/2026-07:11:59] [I] Using random values for input images
[01/14/2026-07:11:59] [I] Input binding for images with dimensions 1x3x640x640 is created.
[01/14/2026-07:11:59] [I] Output binding for output with dimensions 1x25200x85 is created.
[01/14/2026-07:11:59] [I] Output binding for onnx::Sigmoid_709 with dimensions 1x3x80x80x85 is created.
[01/14/2026-07:11:59] [I] Output binding for onnx::Sigmoid_800 with dimensions 1x3x40x40x85 is created.
[01/14/2026-07:11:59] [I] Output binding for onnx::Sigmoid_889 with dimensions 1x3x20x20x85 is created.
[01/14/2026-07:11:59] [I] Starting inference


[01/14/2026-07:12:03] [I] The e2e network timing is not reported since it is inaccurate due to the extra synchronizations when the profiler is enabled.
[01/14/2026-07:12:03] [I] To show e2e network timing report, add --separateProfileRun to profile layer timing in a separate run or remove --dumpProfile to disable the profiler.
[01/14/2026-07:12:03] [I]
[01/14/2026-07:12:03] [I] === Profile (870 iterations ) ===
[01/14/2026-07:12:03] [I]    Time(ms)     Avg.(ms)   Median(ms)   Time(%)   Layer
[01/14/2026-07:12:03] [I]       49.58       0.0570       0.0556       1.7   __myl_CastConc_myl0_0
[01/14/2026-07:12:03] [I]      509.15       0.5852       0.5774      17.6   /model_0/conv/Conv_myl0_1
[01/14/2026-07:12:03] [I]      132.19       0.1519       0.1496       4.6   /model_1/conv/Conv_myl0_2
[01/14/2026-07:12:03] [I]       83.78       0.0963       0.0953       2.9   /model_2/conv/Conv_myl0_3
[01/14/2026-07:12:03] [I]       33.65       0.0387       0.0380       1.2   /model_3/conv/Conv_myl0_4
[01/14/2026-07:12:03] [I]       30.92       0.0355       0.0349       1.1   /model_5/conv/Conv+/model_4/conv/Conv_myl0_5
[01/14/2026-07:12:03] [I]       28.28       0.0325       0.0320       1.0   /model_6/conv/Conv_myl0_6
[01/14/2026-07:12:03] [I]       28.34       0.0326       0.0319       1.0   /model_7/conv/Conv_myl0_7
[01/14/2026-07:12:03] [I]       28.42       0.0327       0.0328       1.0   /model_8/conv/Conv_myl0_8
[01/14/2026-07:12:03] [I]       28.23       0.0324       0.0326       1.0   /model_9/conv/Conv_myl0_9
[01/14/2026-07:12:03] [I]       46.42       0.0534       0.0523       1.6   __myl_Conc_myl0_10
[01/14/2026-07:12:03] [I]       60.91       0.0700       0.0686       2.1   /model_11/conv/Conv_myl0_11
[01/14/2026-07:12:03] [I]       31.28       0.0360       0.0351       1.1   /model_14/conv/Conv_myl0_12
[01/14/2026-07:12:03] [I]       20.94       0.0241       0.0240       0.7   /model_15/conv/Conv_myl0_13
[01/14/2026-07:12:03] [I]       28.34       0.0326       0.0318       1.0   /model_12/m/MaxPool_myl0_14
[01/14/2026-07:12:03] [I]       14.09       0.0162       0.0164       0.5   /model_13/conv/Conv_myl0_15
[01/14/2026-07:12:03] [I]       19.90       0.0229       0.0225       0.7   /model_18/conv/Conv+/model_17/conv/Conv_myl0_16
[01/14/2026-07:12:03] [I]       19.72       0.0227       0.0225       0.7   /model_19/conv/Conv_myl0_17
[01/14/2026-07:12:03] [I]       19.85       0.0228       0.0226       0.7   /model_20/conv/Conv_myl0_18
[01/14/2026-07:12:03] [I]       19.84       0.0228       0.0225       0.7   /model_21/conv/Conv_myl0_19
[01/14/2026-07:12:03] [I]       19.86       0.0228       0.0225       0.7   /model_22/conv/Conv_myl0_20
[01/14/2026-07:12:03] [I]       24.54       0.0282       0.0277       0.8   __myl_Conc_myl0_21
[01/14/2026-07:12:03] [I]       35.77       0.0411       0.0408       1.2   /model_24/conv/Conv_myl0_22
[01/14/2026-07:12:03] [I]       28.19       0.0324       0.0327       1.0   /model_66/conv/Conv+/model_27/conv/Conv_myl0_23
[01/14/2026-07:12:03] [I]       20.31       0.0233       0.0226       0.7   /model_28/conv/Conv_myl0_24
[01/14/2026-07:12:03] [I]       15.82       0.0182       0.0177       0.5   /model_25/m/MaxPool_myl0_25
[01/14/2026-07:12:03] [I]       14.54       0.0167       0.0164       0.5   /model_26/conv/Conv_myl0_26
[01/14/2026-07:12:03] [I]       16.21       0.0186       0.0184       0.6   /model_31/conv/Conv+/model_30/conv/Conv_myl0_27
[01/14/2026-07:12:03] [I]       23.65       0.0272       0.0267       0.8   /model_32/conv/Conv_myl0_28
[01/14/2026-07:12:03] [I]       23.59       0.0271       0.0266       0.8   /model_33/conv/Conv_myl0_29
[01/14/2026-07:12:03] [I]       23.56       0.0271       0.0266       0.8   /model_34/conv/Conv_myl0_30
[01/14/2026-07:12:03] [I]       23.46       0.0270       0.0266       0.8   /model_35/conv/Conv_myl0_31
[01/14/2026-07:12:03] [I]        8.99       0.0103       0.0102       0.3   __myl_Conc_myl0_32
[01/14/2026-07:12:03] [I]       27.18       0.0312       0.0307       0.9   /model_37/conv/Conv_myl0_33
[01/14/2026-07:12:03] [I]       25.21       0.0290       0.0287       0.9   /model_54/conv/Conv+/model_40/conv/Conv_myl0_34
[01/14/2026-07:12:03] [I]       37.97       0.0436       0.0430       1.3   /model_41/conv/Conv_myl0_35
[01/14/2026-07:12:03] [I]       10.45       0.0120       0.0123       0.4   /model_38/m/MaxPool_myl0_36
[01/14/2026-07:12:03] [I]       13.00       0.0149       0.0144       0.4   /model_39/conv/Conv_myl0_37
[01/14/2026-07:12:03] [I]       14.33       0.0165       0.0164       0.5   /model_44/conv/Conv+/model_43/conv/Conv_myl0_38
[01/14/2026-07:12:03] [I]       12.61       0.0145       0.0143       0.4   /model_45/conv/Conv_myl0_39
[01/14/2026-07:12:03] [I]       12.70       0.0146       0.0143       0.4   /model_46/conv/Conv_myl0_40
[01/14/2026-07:12:03] [I]       12.69       0.0146       0.0143       0.4   /model_47/conv/Conv_myl0_41
[01/14/2026-07:12:03] [I]       12.70       0.0146       0.0143       0.4   /model_48/conv/Conv_myl0_42
[01/14/2026-07:12:03] [I]        5.10       0.0059       0.0052       0.2   __myl_Conc_myl0_43
[01/14/2026-07:12:03] [I]       16.80       0.0193       0.0192       0.6   /model_50/conv/Conv_myl0_44
[01/14/2026-07:12:03] [I]       16.88       0.0194       0.0195       0.6   /model_51/cv2/conv/Conv+/model_51/cv1/conv/Conv_myl0_45
[01/14/2026-07:12:03] [I]       41.36       0.0475       0.0471       1.4   /model_51/cv3/conv/Conv_myl0_46
[01/14/2026-07:12:03] [I]       11.03       0.0127       0.0123       0.4   /model_51/cv4/conv/Conv_myl0_47
[01/14/2026-07:12:03] [I]        5.26       0.0060       0.0060       0.2   __myl_Move_myl0_48
[01/14/2026-07:12:03] [I]       68.17       0.0784       0.0776       2.4   /model_51/m_2/MaxPool_myl0_49
[01/14/2026-07:12:03] [I]        5.18       0.0059       0.0060       0.2   __myl_Move_myl0_50
[01/14/2026-07:12:03] [I]       35.80       0.0412       0.0407       1.2   /model_51/m_1/MaxPool_myl0_51
[01/14/2026-07:12:03] [I]        4.52       0.0052       0.0052       0.2   __myl_Move_myl0_52
[01/14/2026-07:12:03] [I]       16.02       0.0184       0.0176       0.6   /model_51/m_0/MaxPool_myl0_53
[01/14/2026-07:12:03] [I]        4.31       0.0050       0.0052       0.1   __myl_Move_myl0_54
[01/14/2026-07:12:03] [I]       17.47       0.0201       0.0203       0.6   /model_51/cv5/conv/Conv_myl0_55
[01/14/2026-07:12:03] [I]       37.92       0.0436       0.0430       1.3   /model_51/cv6/conv/Conv_myl0_56
[01/14/2026-07:12:03] [I]        4.89       0.0056       0.0052       0.2   __myl_Conc_myl0_57
[01/14/2026-07:12:03] [I]       13.24       0.0152       0.0144       0.5   /model_51/cv7/conv/Conv_myl0_58
[01/14/2026-07:12:03] [I]        8.99       0.0103       0.0102       0.3   /model_52/conv/Conv_myl0_59
[01/14/2026-07:12:03] [I]        6.69       0.0077       0.0072       0.2   __myl_ResiConc_myl0_60
[01/14/2026-07:12:03] [I]       14.98       0.0172       0.0165       0.5   /model_57/conv/Conv+/model_56/conv/Conv_myl0_61
[01/14/2026-07:12:03] [I]       14.72       0.0169       0.0164       0.5   /model_58/conv/Conv_myl0_62
[01/14/2026-07:12:03] [I]       11.01       0.0127       0.0123       0.4   /model_59/conv/Conv_myl0_63
[01/14/2026-07:12:03] [I]       11.05       0.0127       0.0123       0.4   /model_60/conv/Conv_myl0_64
[01/14/2026-07:12:03] [I]       11.08       0.0127       0.0123       0.4   /model_61/conv/Conv_myl0_65
[01/14/2026-07:12:03] [I]        9.51       0.0109       0.0103       0.3   __myl_Conc_myl0_66
[01/14/2026-07:12:03] [I]       12.88       0.0148       0.0144       0.4   /model_63/conv/Conv_myl0_67
[01/14/2026-07:12:03] [I]        9.04       0.0104       0.0102       0.3   /model_64/conv/Conv_myl0_68
[01/14/2026-07:12:03] [I]       11.65       0.0134       0.0133       0.4   __myl_ResiConc_myl0_69
[01/14/2026-07:12:03] [I]       19.05       0.0219       0.0225       0.7   /model_69/conv/Conv+/model_68/conv/Conv_myl0_70
[01/14/2026-07:12:03] [I]       17.99       0.0207       0.0205       0.6   /model_70/conv/Conv_myl0_71
[01/14/2026-07:12:03] [I]       12.24       0.0141       0.0143       0.4   /model_71/conv/Conv_myl0_72
[01/14/2026-07:12:03] [I]       12.05       0.0139       0.0142       0.4   /model_72/conv/Conv_myl0_73
[01/14/2026-07:12:03] [I]       11.83       0.0136       0.0141       0.4   /model_73/conv/Conv_myl0_74
[01/14/2026-07:12:03] [I]       22.86       0.0263       0.0257       0.8   __myl_Conc_myl0_75
[01/14/2026-07:12:03] [I]       16.41       0.0189       0.0185       0.6   /model_75/conv/Conv_myl0_76
[01/14/2026-07:12:03] [I]       27.00       0.0310       0.0308       0.9   /model_102/rbr_reparam/Conv_myl0_77
[01/14/2026-07:12:03] [I]       17.92       0.0206       0.0204       0.6   /model_105/m_0/Conv_myl0_78
[01/14/2026-07:12:03] [I]       63.95       0.0735       0.0728       2.2   __myl_SlicMove_myl0_79
[01/14/2026-07:12:03] [I]       34.31       0.0394       0.0389       1.2   __myl_ReshTranCast_myl0_80
[01/14/2026-07:12:03] [I]       12.61       0.0145       0.0143       0.4   /model_78/conv/Conv_myl0_81
[01/14/2026-07:12:03] [I]       12.57       0.0144       0.0143       0.4   /model_79/conv/Conv_myl0_82
[01/14/2026-07:12:03] [I]        7.84       0.0090       0.0092       0.3   /model_76/m/MaxPool_myl0_83
[01/14/2026-07:12:03] [I]        9.01       0.0104       0.0102       0.3   /model_77/conv/Conv_myl0_84
[01/14/2026-07:12:03] [I]       16.18       0.0186       0.0184       0.6   /model_82/conv/Conv+/model_81/conv/Conv_myl0_85
[01/14/2026-07:12:03] [I]       14.86       0.0171       0.0164       0.5   /model_83/conv/Conv_myl0_86
[01/14/2026-07:12:03] [I]       11.07       0.0127       0.0123       0.4   /model_84/conv/Conv_myl0_87
[01/14/2026-07:12:03] [I]       11.07       0.0127       0.0123       0.4   /model_85/conv/Conv_myl0_88
[01/14/2026-07:12:03] [I]       11.17       0.0128       0.0123       0.4   /model_86/conv/Conv_myl0_89
[01/14/2026-07:12:03] [I]        9.22       0.0106       0.0102       0.3   __myl_Conc_myl0_90
[01/14/2026-07:12:03] [I]       14.65       0.0168       0.0164       0.5   /model_88/conv/Conv_myl0_91
[01/14/2026-07:12:03] [I]       32.19       0.0370       0.0368       1.1   /model_103/rbr_reparam/Conv_myl0_92
[01/14/2026-07:12:03] [I]       10.87       0.0125       0.0123       0.4   /model_105/m_1/Conv_myl0_93
[01/14/2026-07:12:03] [I]       19.61       0.0225       0.0225       0.7   __myl_SlicMove_myl0_94
[01/14/2026-07:12:03] [I]       11.59       0.0133       0.0133       0.4   __myl_ReshTranCast_myl0_95
[01/14/2026-07:12:03] [I]       10.83       0.0124       0.0123       0.4   /model_91/conv/Conv_myl0_96
[01/14/2026-07:12:03] [I]       12.88       0.0148       0.0144       0.4   /model_92/conv/Conv_myl0_97
[01/14/2026-07:12:03] [I]        5.56       0.0064       0.0061       0.2   /model_89/m/MaxPool_myl0_98
[01/14/2026-07:12:03] [I]        7.40       0.0085       0.0082       0.3   /model_90/conv/Conv_myl0_99
[01/14/2026-07:12:03] [I]       17.95       0.0206       0.0205       0.6   /model_95/conv/Conv+/model_94/conv/Conv_myl0_100
[01/14/2026-07:12:03] [I]       21.34       0.0245       0.0246       0.7   /model_96/conv/Conv_myl0_101
[01/14/2026-07:12:03] [I]       12.47       0.0143       0.0143       0.4   /model_97/conv/Conv_myl0_102
[01/14/2026-07:12:03] [I]       12.72       0.0146       0.0143       0.4   /model_98/conv/Conv_myl0_103
[01/14/2026-07:12:03] [I]       12.58       0.0145       0.0143       0.4   /model_99/conv/Conv_myl0_104
[01/14/2026-07:12:03] [I]        6.75       0.0078       0.0074       0.2   __myl_Conc_myl0_105
[01/14/2026-07:12:03] [I]       18.39       0.0211       0.0205       0.6   /model_101/conv/Conv_myl0_106
[01/14/2026-07:12:03] [I]       82.12       0.0944       0.0942       2.8   /model_104/rbr_reparam/Conv_myl0_107
[01/14/2026-07:12:03] [I]       13.32       0.0153       0.0154       0.5   /model_105/m_2/Conv_myl0_108
[01/14/2026-07:12:03] [I]       87.73       0.1008       0.0999       3.0   __myl_CastSigmSlicSlicSlicCastSigmSlicSlicSlicSlicMoveReshTranCastCastSigmSlicSlicSlicMulMulMulEtc_myl0_109
[01/14/2026-07:12:03] [I]     2898.90       3.3321       3.2907     100.0   Total
[01/14/2026-07:12:03] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v101302] [b6] # trtexec --loadEngine=weights/yolov7_w4a8.trt --dumpProfile

How do I get to know whether loaded model is int4 quantized or not?

Hi,

Sorry that could you try to run the model again with --profilingVerbosity=detailed --dumpLayerInfo --dumpProfile and share the layer infor with us?

Thanks.