CUBLAS Error During ONNX -> TRT conversion

user43343 · February 3, 2022, 7:46am

Description

I’m using python polygraphy API to convert an ONNX model exported from PyTorch to a TensorRT Engine. The code consists of the two following lines of code:

build_engine = EngineFromNetwork(NetworkFromOnnxPath(onnx_file))
engine = build_engine()

The parsing is successful but building the engine causes lots of warnings

[02/02/2022-18:28:37] [TRT] [W] onnx2trt_utils.cpp:366: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[02/02/2022-18:28:37] [TRT] [W] Tensor DataType is determined at build time for tensors not marked as input or output.
[02/02/2022-18:30:36] [TRT] [W] onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[02/02/2022-18:30:48] [TRT] [W] Output type must be INT32 for shape outputs
[02/02/2022-18:30:48] [TRT] [W] Output type must be INT32 for shape outputs
[02/02/2022-18:30:48] [TRT] [W] Output type must be INT32 for shape outputs
[02/02/2022-18:30:48] [TRT] [W] Output type must be INT32 for shape outputs
[02/02/2022-18:30:48] [TRT] [W] Output type must be INT32 for shape outputs

(Tensor DataType is determined at build time for tensors not marked as input or output. appears about 600 times) before eventually failing with the following error:

[I] Building engine with configuration:
    Workspace            | 16777216 bytes (16.00 MiB)
    Precision            | TF32: False, FP16: False, INT8: False, Obey Precision Constraints: False, Strict Types: False
    Tactic Sources       | ['CUBLAS', 'CUBLAS_LT', 'CUDNN']
    Safety Restricted    | False
    Profiles             | 1 profile(s)
[02/02/2022-18:39:05] [TRT] [E] 2: [ltWrapper.cpp::nvinfer1::rt::CublasLtWrapper::setupHeuristic::327] Error Code 2: Internal Error (Assertion cublasStatus == CUBLAS_STATUS_SUCCESS failed. )
[02/02/2022-18:39:05] [TRT] [E] 2: [builder.cpp::nvinfer1::builder::Builder::buildSerializedNetwork::609] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed. )

I have two questions:

Are all those warnings an actual problem potentially resulting in a slow-down of the inference? In that case, how can I get rid of them?
How can solve the error? I read it could be caused by not having CUDA10.2 patches installed but installing them didn’t solve the problem.

Environment

TensorRT Version: 8.2.3.0
NVIDIA GPU: GeForce GTX 1650 Ti
NVIDIA Driver Version: 30.0.14.9649
CUDA Version: 10.2
CUDNN Version: 8.2.1.32
Operating System: Windows 10
Python Version (if applicable): 3.7
PyTorch Version (if applicable): 1.10.1
Baremetal or Container (if so, version): Baremetal

Relevant Files

pcr.onnx (45.8 MB)

spolisetty · February 5, 2022, 5:37pm

Hi,

Could you please let us know CUBLAS version you’re using. Will get back on your queries again.

Thank you.

user43343 · February 5, 2022, 5:59pm

Hello, thanks for replying.
How do I check that? I should have the version coming with CUDA 10.2. I compiled the first snippet on this page using nvcc test_cublas.c -lcublas -o test_cublas to check if I had a functioning CUBLAS installed and it compiled correctly.

EDIT: the code I compiled has an #include "cublas_v2.h" so I guess I have CUBLAS 2 (point something?).

user43343 · February 7, 2022, 8:41am

I actually solved the compilation problem installing CUDA 11.4 (with 10.2 doesn’t work, someone should look into it).
I still have all those warnings that I’ve posted above. Could someone help me get rid of them?
Right now the rt engine is slower than the PyTorch model and I hope solving those warnings might help

spolisetty · February 8, 2022, 5:44am

Hi,

The INT64 and INT32 warnings are there to notify the user that we are down casting any INT64 indices to INT32. Generally, this has no effect on the resulting ONNX model and inference.

If you’re facing TRT engine is slower than the PyTorch model issue, could you please share with us minimal issue repro script and model to try from our end for better debugging.

Thank you.

user43343 · February 8, 2022, 1:48pm

What about this?

Tensor DataType is determined at build time for tensors not marked as input or output.

It appears about 600 times.

Anyway, I moved from using polygraphy to creating the engine and doing the inference directly with tensorRT api and now it runs at 30 fps so I guess my problem is soved. Thanks for the assistance.

user43343 · February 10, 2022, 3:54pm

If you want to recreate the slower version of the engine you can create it with

build_engine = EngineFromNetwork(NetworkFromOnnxPath(onnx_file))
engine = build_engine()

with open('assets/production/pcr.engine', 'wb') as f:
    f.write(engine.serialize())

and execute it with

data_loader = DataLoader(iterations=1,
                         val_range=(-0.5, 0.5),
                         input_metadata=TensorMetadata.from_feed_dict(
                             {'input': np.zeros([1, 2024, 3], dtype=np.float32)}))

with open('assets/production/pcr.engine', 'rb') as f:
    serialized_engine = f.read()
    engine = EngineFromBytes(serialized_engine)


for x in tqdm.tqdm(data_loader):
    with TrtRunner(engine) as runner:
        outputs = runner.infer(feed_dict=x)

The faster one is created using directly tensorrt Python API as described in the guide.
That way I can achieve around 30fps but the results are wrong,
I’m using polygraphy run to debug it right now.

spolisetty · February 21, 2022, 8:46am

Hi,

Normally we can ignore this warning. It’s just saying that tensor dtype setting has no effect unless the tensor is a network input or output.

Regarding the wrong output results, were you able to correct it using polygraphy ? If you’re still facing the issue, please share with us the issue repro model and script to try from our end and help you.

Thank you.

Topic		Replies	Views
Pytorch -> ONNX -> TensorRT inference with terrible accuracy (int64 clamped to int32) TensorRT cudnn	2	1329	January 23, 2024
ONNX model and TensorRT engine works differently TensorRT	5	725	February 20, 2023
ONNX Model Int64 Weights TensorRT	12	13175	February 17, 2024
[E] 2: [ltWrapper.cpp::setupHeuristic::349] Error Code 2: Internal Error (Assertion cublasStatus == CUBLAS_STATUS_SUCCESS failed. ) TensorRT tensorrt , cuda , ubuntu , python	1	1111	July 8, 2022
Meet some problem with --precisionConstraints=obey --layerPrecisions TensorRT	1	1399	January 17, 2023
Fake quantization ONNX model parse ERROR using TensorRT TensorRT	6	1770	September 26, 2021
TensorRT engine gives garbage output TensorRT	1	973	February 10, 2020
TensorRT get different result in python and c++ TensorRT	21	2872	August 24, 2022
Tenssorrt INT8 precision engine build failed for the models having custom layer (BatchedNMSDynamic_TRT) TensorRT	11	1912	June 29, 2021
TensorRT8 INT8 (signed char) I/O interface for ONNX model TensorRT tensorrt , onnx	4	1363	February 15, 2022

CUBLAS Error During ONNX -> TRT conversion

Description

Environment

Relevant Files

Related topics