I can't get result from TensorRT model

837720757 · May 29, 2022, 12:28pm

Description

I tried to convert the GPT model from pytorch to onnx and then to tensorRT, I successfully converted to tensorRT engine, but I can’t get the results I want during the inference phase, I can guarantee that the onnx model is correct. These two warnings appeared in the process of converting the onnx model to the tensorRT engine. I don’t know if these two warnings will affect the engine conversion.

[05/29/2022-19:08:00] [TRT] [W] onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[05/29/2022-19:08:01] [TRT] [W] ShapedWeights.cpp:173: Weights transformer.h.8.attn.c_attn.weight has been transposed with permutation of (1, 0)! If you plan on overwriting the weights with the Refitter API, the new weights must be pre-transposed.

The code that onnx converts to tensorRT：

import tensorrt as trt

logger = trt.Logger(trt.Logger.WARNING)

builder = trt.Builder(logger)

network = builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))

parser = trt.OnnxParser(network, logger)

success = parser.parse_from_file('model.onnx')
# for idx in range(parser.num_errors):
#     print(parser.get_error(idx))

if not success:
    pass # Error handling code here

config = builder.create_builder_config()
#config.set_memory_pool_limit(trt.MemoryPoolType.WORKSPACE, 1 << 20) # 1 MiB
config.max_workspace_size = 1 << 31

profile = builder.create_optimization_profile()  
profile.set_shape("input_ids", (1, 1), (1, 20), (1, 300))
profile.set_shape("token_type_ids", (1, 1), (1, 20), (1, 300))
config.add_optimization_profile(profile)

serialized_engine = builder.build_serialized_network(network, config)
with open("sample4.engine", "wb") as f:
    f.write(serialized_engine)

The main code to inference, input_ids and token_type_ids is two input for the model.

context.active_optimization_profile = 0
origin_inputshape = context.get_binding_shape(0)
origin_inputshape[0],origin_inputshape[1] = input_ids.shape
context.set_binding_shape(0,(origin_inputshape))
context.set_binding_shape(1,(origin_inputshape))


inputs, outputs, bindings, stream = common.allocate_buffers(engine)
inputs[1].host = input_ids
inputs[0].host = token_type_ids

logits, *_= common.do_inference_v2(context,bindings = bindings, inputs= inputs, outputs=outputs, stream = stream)

the model I want to convert is OpenAIGPTLMHeadModel, I can only put one link, but you can cheack it from huggingface

Environment

TensorRT Version: 8.2.5.1
GPU Type: RTX 3060
Nvidia Driver Version: 497.38
CUDA Version: 11.5.1
CUDNN Version: 8.2.1.32
Operating System + Version: Windows11
Python Version (if applicable): 3.8.13
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.11
Baremetal or Container (if container which image + tag):

Relevant Files

github link to my code
RuntensorRT is inference phase

NVES · May 29, 2022, 12:37pm

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.

In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

837720757 · May 29, 2022, 1:18pm

1.Validation results

2.I try run trtexec with ‘./trtexec --onnx=D:\Subject\dialogue\CDial-GPT\model.onnx --saveEngine=D:\Subject\dialogue\CDial-GPT\sample.engine --fp16 --workspace=10000 --minShapes=input_ids:1x1,token_type_ids:1x1 --optShapes=input_ids:1x300,token_type_ids:1x300 --maxShapes=input_ids:1x300,token_type_ids:1x300 --device=0 --verbose --exportTimes=trace.json’
here are all the logs I can get
logs.txt (683.1 KB)

837720757 · May 29, 2022, 1:31pm

The onnx file is too big to upload,I am uploading the onnx model to google drive, can I have your email so I can share with you, or you have a more convenient way.

837720757 · May 29, 2022, 1:35pm

It’s trace.josn
trace.json (169.2 KB)

837720757 · May 29, 2022, 1:38pm

I have reply blow my question, please check.

837720757 · May 29, 2022, 1:51pm

The result is no longer 0, but the dimension is still wrong, the correct dimension is 3.

837720757 · May 29, 2022, 2:08pm

The problem seems to be in allocate_buffers(engine):, I changed the size to quantitative before, because the size obtained by trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size is a negative number, so in host_mem = cuda.pagelocked_empty(size, dtype), the error is ‘pycuda._driver.MemoryError: cuMemHostAlloc failed: out of memory’ , how can I solve this problem?

def allocate_buffers(engine):
    inputs = []
    outputs = []
    bindings = []
    stream = cuda.Stream()
    for binding in engine:
        print(engine.get_binding_shape(binding))
        size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size
        dtype = trt.nptype(engine.get_binding_dtype(binding))
        # Allocate host and device buffers
        host_mem = cuda.pagelocked_empty(size, dtype)
        device_mem = cuda.mem_alloc(host_mem.nbytes)
        # Append the device buffer to device bindings.
        bindings.append(int(device_mem))
        # Append to the appropriate list.
        if engine.binding_is_input(binding):
            inputs.append(HostDeviceMem(host_mem, device_mem))
        else:
            outputs.append(HostDeviceMem(host_mem, device_mem))
    return inputs, outputs, bindings, stream

spolisetty · May 31, 2022, 1:13pm

Hi,

The above error is related to dimensions, maybe you’re not handling the dynamic shape correctly.
Could you use context.get_binding_shape correctly for the engine with dynamic shape.

Please share with us issue repro script and model to try from our end if you still face this issue.

Thank you.

Topic		Replies	Views
Tensorrt8.5 inference different with origin onnx model TensorRT	6	1092	December 13, 2022
ONNX Model and Tensorrt Engine gives different output TensorRT tensorrt , onnx	13	5410	June 29, 2022
Dynamic batch size for tensorrt Engine TensorRT tensorrt	1	1318	May 30, 2024
Pytorch -> ONNX -> TensorRT inference with terrible accuracy (int64 clamped to int32) TensorRT cudnn	2	1377	January 23, 2024
LSTM ONNX to TensorRT mismatched outputs TensorRT tensorrt	3	968	September 29, 2022
Torchvision Faster RCNN failed to convert to TensorRT engine TensorRT tensorrt , ubuntu , python	3	1448	October 5, 2023
How to use different profile in tensorrt? TensorRT tensorrt , python	3	1410	July 19, 2022
Inference result gets worse when converting pytorch model to TensorRT model TensorRT pytorch	6	1150	January 19, 2022
Onnx -> tensorrt fp32 conversion performance degradation different outputs TensorRT tensorrt , pytorch , onnx	4	2068	November 29, 2022
How could I change the batchsize during inference when using a tensorRT model converted by onnx? TensorRT	8	4687	October 12, 2021

I can't get result from TensorRT model

Description

Environment

Relevant Files

check_model.py

Related topics