Description
convert an ONNX model which is very complex to an tensortrt fp32 model, the outputs of the onnx model and trtfp32 model are the same the commd as follows:
trtexec --onnx=encoder2.onnx --fp16 --saveEngine=encoderfp16.trt --useCudaGraph --verbose --tacticSources=-cublasLt,+cublas --workspace=10240M --minShapes=src_tokens:1x1000 --optShapes=src_tokens:1x100000 --maxShapes=src_tokens:1x700000 --preview=+fasterDynamicShapes0805 >log.en
all the thing is ok
However when I conert the onnx model to an tensort fp16 the output is very different and some weights affected
[01/13/2023-10:07:54] [W] [TRT] onnx2trt_utils.cpp:377: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[01/13/2023-10:09:43] [W] [TRT] Using kFASTER_DYNAMIC_SHAPES_0805 preview feature.
[01/13/2023-10:20:45] [W] [TRT] TensorRT encountered issues when converting weights between types and that could affect accuracy.
[01/13/2023-10:20:45] [W] [TRT] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
[01/13/2023-10:20:45] [W] [TRT] Check verbose logs for the list of affected weights.
[01/13/2023-10:20:45] [W] [TRT] - 254 weights are affected by this issue: Detected subnormal FP16 values.
[01/13/2023-10:20:45] [W] [TRT] - 31 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value.
detail log in log.en
log.en (33.7 MB)
I want to use –precisionConstraints and --layerPrecisions to restrict some weiths to fp32,the commd as follows:
trtexec --onnx=encoder2.onnx --fp16 --saveEngine=encoderfp16.trt --useCudaGraph --verbose --tacticSources=-cublasLt,+cublas --workspace=10240M --minShapes=src_tokens:1x1000 --optShapes=src_tokens:1x100000 --maxShapes=src_tokens:1x700000 --preview=+fasterDynamicShapes0805 –precisionConstraints=obey --layerPrecisions=Conv_240.weight:fp32, Conv_263.weight:fp32, Conv_286.weight:fp32 >log.en1
But the log output are the same,
[01/13/2023-10:42:38] [W] [TRT] onnx2trt_utils.cpp:377: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[01/13/2023-10:44:28] [W] [TRT] Using kFASTER_DYNAMIC_SHAPES_0805 preview feature.
[01/13/2023-10:55:24] [W] [TRT] TensorRT encountered issues when converting weights between types and that could affect accuracy.
[01/13/2023-10:55:24] [W] [TRT] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
[01/13/2023-10:55:24] [W] [TRT] Check verbose logs for the list of affected weights.
[01/13/2023-10:55:24] [W] [TRT] - 254 weights are affected by this issue: Detected subnormal FP16 values.
[01/13/2023-10:55:24] [W] [TRT] - 31 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value.
[01/13/2023-10:55:26] [W] [TRT] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See CUDA_MODULE_LOADING
in CUDA C++ Programming Guide
detail log in log.en1
log.en1 (33.9 MB)
A clear and concise description of the bug or issue.
Environment
TensorRT Version: TensorRT-8.5.2.2.Linux.x86_64-gnu.cuda-11.8.cudnn8.6.tar.gz
GPU Type: Tesla V100-PCIE
CUDA Version: 11.6
CUDNN Version: cudnn-linux-x86_64-8.6.0.163_cuda11-archive
PyTorch Version (if applicable): 1.12.0
onnx: 1.12.0
Relevant Files
Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)
Steps To Reproduce
Please include:
- Exact steps/commands to build your repro
- Exact steps/commands to run your repro
- Full traceback of errors encountered