CUBLAS Error During ONNX -> TRT conversion


I’m using python polygraphy API to convert an ONNX model exported from PyTorch to a TensorRT Engine. The code consists of the two following lines of code:

build_engine = EngineFromNetwork(NetworkFromOnnxPath(onnx_file))
engine = build_engine()

The parsing is successful but building the engine causes lots of warnings

[02/02/2022-18:28:37] [TRT] [W] onnx2trt_utils.cpp:366: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[02/02/2022-18:28:37] [TRT] [W] Tensor DataType is determined at build time for tensors not marked as input or output.
[02/02/2022-18:30:36] [TRT] [W] onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[02/02/2022-18:30:48] [TRT] [W] Output type must be INT32 for shape outputs
[02/02/2022-18:30:48] [TRT] [W] Output type must be INT32 for shape outputs
[02/02/2022-18:30:48] [TRT] [W] Output type must be INT32 for shape outputs
[02/02/2022-18:30:48] [TRT] [W] Output type must be INT32 for shape outputs
[02/02/2022-18:30:48] [TRT] [W] Output type must be INT32 for shape outputs

(Tensor DataType is determined at build time for tensors not marked as input or output. appears about 600 times) before eventually failing with the following error:

[I] Building engine with configuration:
    Workspace            | 16777216 bytes (16.00 MiB)
    Precision            | TF32: False, FP16: False, INT8: False, Obey Precision Constraints: False, Strict Types: False
    Tactic Sources       | ['CUBLAS', 'CUBLAS_LT', 'CUDNN']
    Safety Restricted    | False
    Profiles             | 1 profile(s)
[02/02/2022-18:39:05] [TRT] [E] 2: [ltWrapper.cpp::nvinfer1::rt::CublasLtWrapper::setupHeuristic::327] Error Code 2: Internal Error (Assertion cublasStatus == CUBLAS_STATUS_SUCCESS failed. )
[02/02/2022-18:39:05] [TRT] [E] 2: [builder.cpp::nvinfer1::builder::Builder::buildSerializedNetwork::609] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed. )

I have two questions:

  1. Are all those warnings an actual problem potentially resulting in a slow-down of the inference? In that case, how can I get rid of them?

  2. How can solve the error? I read it could be caused by not having CUDA10.2 patches installed but installing them didn’t solve the problem.


TensorRT Version:
NVIDIA GPU: GeForce GTX 1650 Ti
NVIDIA Driver Version:
CUDA Version: 10.2
CUDNN Version:
Operating System: Windows 10
Python Version (if applicable): 3.7
PyTorch Version (if applicable): 1.10.1
Baremetal or Container (if so, version): Baremetal

Relevant Files

pcr.onnx (45.8 MB)


Could you please let us know CUBLAS version you’re using. Will get back on your queries again.

Thank you.

1 Like

Hello, thanks for replying.
How do I check that? I should have the version coming with CUDA 10.2. I compiled the first snippet on this page using nvcc test_cublas.c -lcublas -o test_cublas to check if I had a functioning CUBLAS installed and it compiled correctly.

EDIT: the code I compiled has an #include "cublas_v2.h" so I guess I have CUBLAS 2 (point something?).

I actually solved the compilation problem installing CUDA 11.4 (with 10.2 doesn’t work, someone should look into it).
I still have all those warnings that I’ve posted above. Could someone help me get rid of them?
Right now the rt engine is slower than the PyTorch model and I hope solving those warnings might help


The INT64 and INT32 warnings are there to notify the user that we are down casting any INT64 indices to INT32. Generally, this has no effect on the resulting ONNX model and inference.

If you’re facing TRT engine is slower than the PyTorch model issue, could you please share with us minimal issue repro script and model to try from our end for better debugging.

Thank you.

What about this?

Tensor DataType is determined at build time for tensors not marked as input or output.

It appears about 600 times.

Anyway, I moved from using polygraphy to creating the engine and doing the inference directly with tensorRT api and now it runs at 30 fps so I guess my problem is soved. Thanks for the assistance.

If you want to recreate the slower version of the engine you can create it with

build_engine = EngineFromNetwork(NetworkFromOnnxPath(onnx_file))
engine = build_engine()

with open('assets/production/pcr.engine', 'wb') as f:

and execute it with

data_loader = DataLoader(iterations=1,
                         val_range=(-0.5, 0.5),
                             {'input': np.zeros([1, 2024, 3], dtype=np.float32)}))

with open('assets/production/pcr.engine', 'rb') as f:
    serialized_engine =
    engine = EngineFromBytes(serialized_engine)

for x in tqdm.tqdm(data_loader):
    with TrtRunner(engine) as runner:
        outputs = runner.infer(feed_dict=x)

The faster one is created using directly tensorrt Python API as described in the guide.
That way I can achieve around 30fps but the results are wrong,
I’m using polygraphy run to debug it right now.


Normally we can ignore this warning. It’s just saying that tensor dtype setting has no effect unless the tensor is a network input or output.

Regarding the wrong output results, were you able to correct it using polygraphy ? If you’re still facing the issue, please share with us the issue repro model and script to try from our end and help you.

Thank you.