INT8 Quantization Fails on Nvidia Jetson Xavier AGX (32GB) for ACT Model

Description

Hello,

I attempted to perform INT8 quantization with TensorRT on an Action Chunking with Transformer (ACT) neural network model using Nvidia Jetson Xavier AGX 32GB, but the engine build fails with the following error:

[E] 2: [weightConvertors.cpp::quantizeBiasCommon::337] Error Code 2: Internal Error (Assertion getter(i) != 0 failed. )
[E] 2: [builder.cpp::buildSerializedNetwork::751] Error Code 2: Internal Error (Assertion engine != nullptr failed. )
[!] Invalid Engine. Please ensure the engine was built correctly
polygraphy.exception.exception.PolygraphyException: Invalid Engine. Please ensure the engine was built correctly

Interestingly, the same model builds successfully with INT8 quantization on my desktop environment with the following specs:(The only change I made was lowering the ONNX opset version from 20 to 16 during model export.)

GPU: RTX 5070

CUDA: 12.8

TensorRT: 10.12

Inference works fine and performance improves as expected.

However, on the Jetson Xavier AGX, the engine build only works with FP32 — INT8 quantization fails with the error above.

Environment

TensorRT Version: 8.5.2.2

NVIDIA GPU: Nvidia Jeston Xavier AGX 32GB

NVIDIA Driver Version: Jetpack 5.1.3

CUDA Version: 11.4

CUDNN Version: 8.6.0

python version : 3.8

Relevant Files

act/act.onnx : onnx model (opset 16)
act/act.cache : calibrator cache file
act/act_origin.engine : fp32 engine

onnx transformation code

    batch_size = 1
    state_dim = 6
    num_cameras = 3
    channels = len(camera_names)
    height = 480
    width = 640

    dummy_qpos = torch.randn(batch_size, state_dim).cuda()
    dummy_image = torch.randn(batch_size, num_cameras, channels, height, width).cuda()

    # ONNX로 export
    onnx_path = os.path.join(save_dir, "act.onnx")
    torch.onnx.export(
        policy,                                
        (dummy_qpos, dummy_image),             
        onnx_path,                            
        export_params=True,                    
        opset_version=16,              
        do_constant_folding=True,           
        input_names=['qpos', 'image'],         
        output_names=['action'],                
    )

Data Loader for INT8 Calibrator(polygraphy)

def data_generator(norm_stats, representative_episode_ids):
    for episode_id in representative_episode_ids:
        dataset_path = os.path.join(dataset_dir, f"episode_{episode_id}.hdf5")
        with h5py.File(dataset_path, 'r') as root:
            qpos_data = root['/observations/qpos'][()] # shape : (episode_len, qpos)
            image_dict = dict()
            for cam_name in camera_names:
                image_dict[cam_name] = root[f'/observations/images/{cam_name}'][()] # shape : (episode_len, 480, 640, 3)

            # new axis for different cameras
            all_cam_images = []
            for cam_name in camera_names: 
                all_cam_images.append(image_dict[cam_name])
            all_cam_images = np.stack(all_cam_images, axis=0) # shape : (# of camera, episode_len, 480, 640 ,3)

            # channel last
            image_data = torch.from_numpy(all_cam_images)
            qpos_data = torch.from_numpy(qpos_data)
            image_data = torch.einsum('b k h w c -> k b c h w', image_data) # shape : (episode_len, # of camera, 3, 480, 640)

            # Change tensor to numpy
            image_data = image_data.numpy()
            qpos_data = qpos_data.numpy()

            # normalize image and change dtype to float
            image_data = image_data / np.float32(255.0)
            qpos_data = (qpos_data - norm_stats["qpos_mean"]) / norm_stats["qpos_std"]

        for qpos, image in zip(qpos_data, image_data):
            # Dict key must be the same as ONNX input name
            qpos = np.expand_dims(qpos, axis=0)
            image = np.expand_dims(image, axis=0)
            yield {"qpos" : qpos, "image" : image}
    builder, network, parser = poly_trt.network_from_onnx_path(path=onnx_model_path)

    if fp32:
        builder_config = poly_trt.create_config(builder=builder,
                                                network=network)
    else:
        if calibration:
            calibrator = poly_trt.Calibrator(data_loader=data_generator(norm_stats, episode_ids),
                                             cache="act.cache")
            
        else:
            calibrator = poly_trt.Calibrator(data_loader=random_data_generator(norm_stats))
           

        # Each type flag must be set to true.
        builder_config = poly_trt.create_config(builder=builder, network=network,
                                                int8=True, fp16=fp16, tf32=tf32,
                                                calibrator=calibrator)

    engine = poly_trt.engine_from_network(network=(builder, network, parser), 
                                          config=builder_config)

Any insights into why this might be happening or how to resolve it would be greatly appreciated.

Thanks in advance!

Dear @pst120899 ,
Can you check the model using trtexec with int8 flag quickly to see if it is working and notice same issue? Please share the build verbose logs as well.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.