Description
Hello,
I attempted to perform INT8 quantization with TensorRT on an Action Chunking with Transformer (ACT) neural network model using Nvidia Jetson Xavier AGX 32GB, but the engine build fails with the following error:
[E] 2: [weightConvertors.cpp::quantizeBiasCommon::337] Error Code 2: Internal Error (Assertion getter(i) != 0 failed. )
[E] 2: [builder.cpp::buildSerializedNetwork::751] Error Code 2: Internal Error (Assertion engine != nullptr failed. )
[!] Invalid Engine. Please ensure the engine was built correctly
polygraphy.exception.exception.PolygraphyException: Invalid Engine. Please ensure the engine was built correctly
Interestingly, the same model builds successfully with INT8 quantization on my desktop environment with the following specs:(The only change I made was lowering the ONNX opset version from 20 to 16 during model export.)
GPU: RTX 5070
CUDA: 12.8
TensorRT: 10.12
Inference works fine and performance improves as expected.
However, on the Jetson Xavier AGX, the engine build only works with FP32 — INT8 quantization fails with the error above.
Environment
TensorRT Version: 8.5.2.2
NVIDIA GPU: Nvidia Jeston Xavier AGX 32GB
NVIDIA Driver Version: Jetpack 5.1.3
CUDA Version: 11.4
CUDNN Version: 8.6.0
python version : 3.8
Relevant Files
act/act.onnx : onnx model (opset 16)
act/act.cache : calibrator cache file
act/act_origin.engine : fp32 engine
onnx transformation code
batch_size = 1
state_dim = 6
num_cameras = 3
channels = len(camera_names)
height = 480
width = 640
dummy_qpos = torch.randn(batch_size, state_dim).cuda()
dummy_image = torch.randn(batch_size, num_cameras, channels, height, width).cuda()
# ONNX로 export
onnx_path = os.path.join(save_dir, "act.onnx")
torch.onnx.export(
policy,
(dummy_qpos, dummy_image),
onnx_path,
export_params=True,
opset_version=16,
do_constant_folding=True,
input_names=['qpos', 'image'],
output_names=['action'],
)
Data Loader for INT8 Calibrator(polygraphy)
def data_generator(norm_stats, representative_episode_ids):
for episode_id in representative_episode_ids:
dataset_path = os.path.join(dataset_dir, f"episode_{episode_id}.hdf5")
with h5py.File(dataset_path, 'r') as root:
qpos_data = root['/observations/qpos'][()] # shape : (episode_len, qpos)
image_dict = dict()
for cam_name in camera_names:
image_dict[cam_name] = root[f'/observations/images/{cam_name}'][()] # shape : (episode_len, 480, 640, 3)
# new axis for different cameras
all_cam_images = []
for cam_name in camera_names:
all_cam_images.append(image_dict[cam_name])
all_cam_images = np.stack(all_cam_images, axis=0) # shape : (# of camera, episode_len, 480, 640 ,3)
# channel last
image_data = torch.from_numpy(all_cam_images)
qpos_data = torch.from_numpy(qpos_data)
image_data = torch.einsum('b k h w c -> k b c h w', image_data) # shape : (episode_len, # of camera, 3, 480, 640)
# Change tensor to numpy
image_data = image_data.numpy()
qpos_data = qpos_data.numpy()
# normalize image and change dtype to float
image_data = image_data / np.float32(255.0)
qpos_data = (qpos_data - norm_stats["qpos_mean"]) / norm_stats["qpos_std"]
for qpos, image in zip(qpos_data, image_data):
# Dict key must be the same as ONNX input name
qpos = np.expand_dims(qpos, axis=0)
image = np.expand_dims(image, axis=0)
yield {"qpos" : qpos, "image" : image}
builder, network, parser = poly_trt.network_from_onnx_path(path=onnx_model_path)
if fp32:
builder_config = poly_trt.create_config(builder=builder,
network=network)
else:
if calibration:
calibrator = poly_trt.Calibrator(data_loader=data_generator(norm_stats, episode_ids),
cache="act.cache")
else:
calibrator = poly_trt.Calibrator(data_loader=random_data_generator(norm_stats))
# Each type flag must be set to true.
builder_config = poly_trt.create_config(builder=builder, network=network,
int8=True, fp16=fp16, tf32=tf32,
calibrator=calibrator)
engine = poly_trt.engine_from_network(network=(builder, network, parser),
config=builder_config)
Any insights into why this might be happening or how to resolve it would be greatly appreciated.
Thanks in advance!