When parsing a network containing int8 input, the parser fails to parse any subsequent int8 operations. I’ve added an overview of the network, while the full onnx file is also attached. The input is int8, while the cast converts to float32. I’d like to know why the parser considers this invalid. Note that passing int8 input and immediately casting works fine.
I’ve been digging into the tensorrt support matrix, The IElementWiseLayer does not support int8 precision, which is probably why my onnx model fails to parse. Can someone shed some light onto why operators like 2D convolution support int8, while the most basic elementwise operators don’t?
Details
The second layer sub fails to parse, complainging about int8 being an invalid weight type. Snippet of relevant logs using ./trtexec --onnx="./intTest.onnx" --verbose :
03/10/2021-13:22:06] [V] [TRT] [TRT]/home/jenkins/workspace/OSS/L0_MergeRequest/oss/parsers/onnx/ModelImporter.cpp:107: Parsing node: PartitionedCall/sub [Sub]
[03/10/2021-13:22:06] [V] [TRT] [TRT]/home/jenkins/workspace/OSS/L0_MergeRequest/oss/parsers/onnx/ModelImporter.cpp:123: Searching for input: input:0
[03/10/2021-13:22:06] [V] [TRT] [TRT]/home/jenkins/workspace/OSS/L0_MergeRequest/oss/parsers/onnx/ModelImporter.cpp:123: Searching for input: PartitionedCall/sub/y:0
[03/10/2021-13:22:06] [V] [TRT] [TRT]/home/jenkins/workspace/OSS/L0_MergeRequest/oss/parsers/onnx/ModelImporter.cpp:129: PartitionedCall/sub [Sub] inputs: [input:0 -> (1024, 3)], [PartitionedCall/sub/y:0 -> ()],
[03/10/2021-13:22:06] [E] [TRT] (Unnamed Layer* 0) [Constant]: invalid weights type of Int8
[03/10/2021-13:22:06] [E] [TRT] (Unnamed Layer* 0) [Constant]: invalid weights type of Int8
[03/10/2021-13:22:06] [W] [TRT] [TRT]/home/jenkins/workspace/OSS/L0_MergeRequest/oss/parsers/onnx/onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[03/10/2021-13:22:06] [E] [TRT] (Unnamed Layer* 0) [Constant]: invalid weights type of Int8
[03/10/2021-13:22:06] [E] [TRT] (Unnamed Layer* 0) [Constant]: invalid weights type of Int8
[03/10/2021-13:22:06] [E] [TRT] (Unnamed Layer* 0) [Constant]: invalid weights type of Int8
While parsing node number 0 [Sub -> "PartitionedCall/sub:0"]:
--- Begin node ---
input: "input:0"
input: "PartitionedCall/sub/y:0"
output: "PartitionedCall/sub:0"
name: "PartitionedCall/sub"
op_type: "Sub"
--- End node ---
ERROR: /home/jenkins/workspace/OSS/L0_MergeRequest/oss/parsers/onnx/onnx2trt_utils.cpp:673 In function elementwiseHelper:
[8] Assertion failed: tensor_ptr->getDimensions().nbDims == maxNbDims && "Failed to broadcast tensors elementwise!"
[03/10/2021-13:22:06] [E] Failed to parse onnx file
Environment
TensorRT Version: 7.1.3 GPU Type: Nvidia Driver Version: 450.102.04 CUDA Version: 11.0.3 CUDNN Version: 8.0.4 Operating System + Version: Python Version (if applicable): TensorFlow Version (if applicable): PyTorch Version (if applicable): Baremetal or Container (if container which image + tag): nvcr.io/nvidia/tensorrt:20.09-py3
Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:
validating your model with the below snippet
check_model.py
import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command. https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec
In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!
Hi, the onnx model is already added under the relevant files section.
I’ve tried out the check_model.py steps, check_model doesn’t return anything (assuming that to be a good sign).
I probably didn’t make it very clear in the post, but i already found that int8 subtraction is not supported by tensorRT, after looking at support matrix. If remove the subtraction, parsing works. I’m still left with the question of why some more advanced operations such as 2d convolutions support int8, but element wise operations like subtraction don’t
Based on the error you’ve shared, “[E] [TRT] (Unnamed Layer* 0) [Constant]: invalid weights type of Int8”,
We do not support INT8 weights because TRT performs the weights quantization itself.
Hi, thank you for the reply. At this moment i don’t want to apply quantization at all, i merely want to pass my data as int8 to reduce transfer times and avoid having to copy data on the cpu.
On the cpu i have uint8 data. Since TensorRT does not support unsigned types, i have to make sure that the int8 data is interpreted correctly. Subtracting 128 in int8 arithmetic (assuming 2-complement), then casting to float32 and adding 128 again gives me the same (after division by 255) as if i were to convert my uint8 data to float32 on the cpu and pass that, with the benefit of avoiding the cpu conversion + having to copy 4 times less data from cpu to gpu.
If i move the subtraction out of the model and do it on the cpu everything works fine. However for my application i can’t modify the original data so i have to make a copy before doing the subtraction, which is why i wanted to integrate it into the model.
Sorry for delayed response.
Currently we see that you are not preprocessing the data correctly. You have uint8 data and want to minus 128, but the input is already int8_t so the data has already been truncated. Your model will not work accordingly.
Ideal solution should be do first sub 128 on your own cpu code. And create following network
Input ------------ Add ----…
Constatnt 128 -/
This 128 should be float. Then enable int8 and only set input’s dynamic range to [-127, 127] TensorRT will automatically convert data, you will not need do any additional cast.
Hi @spolisetty , thank you for revisiting my question.
I did not immediately see the problem with my workflow, so i wrote a step by step test in python to verify that my procedure gives the correct result under 2-complement.
If you compare the result in the end, it is the same as directly converting uint8 to float32. As TensorRT does not know unsigned types and interprets my copied bytes as int8, i have to take this workaround.
import numpy as np
uint8_data = np.arange(0,256,dtype=np.uint8)
print("uint8 range {}".format(uint8_data))
print("Directly converting uint8 to float {}".format(uint8_data.astype(np.float32)))
int8_data = uint8_data.astype(np.int8)
print("uint8 interpreted as int8 {}".format(int8_data))
shifted_int8 = int8_data
shifted_int8 -= 128
print("subtract 128 in int8 {}".format(shifted_int8))
print("Now convert to float and then add 128 {}".format(shifted_int8.astype(np.float32) + 128.0))
Now, i can do the minus 128 in int8 on the cpu (which i do, and everything works exactly as if i would pass float32 data and start from there), but it requires me to copy stuff on the cpu side as i can’t alter the original data. I would also find it a bit cleaner if this was hidden from client code.
Is the dynamic range really necessary? if the input is already int8 and in the range of [-128,127] there really isn’t any quantization step that needs to be performed. Is something being executed even though the input is already int8?
If we specify int8 for input, then we must use dynamic range. Or else TensorRT won’t be able to generate the engine. Our int8 is not designed for such usage.