Internal Error: GPU error during getBestTactic: PWN(LeakyRelu_4) : misaligned address

Description

A model cannot run due to some “misaligned address” issues when optimizing leaky relu. A minimized model is attached to reproduce this issue.

Environment

TensorRT Version : 8.4.1.5
NVIDIA GPU : RTX 3080Ti
NVIDIA Driver Version : 510
CUDA Version : 11.6
CUDNN Version : 8.4.1
Operating System : Ubuntu 20.04
Python Version (if applicable) : 3.8

Relevant Files

model.onnx (57.2 KB)

Steps To Reproduce

import onnx
import tensorrt as trt
import pycuda.autoinit
import pycuda.driver as cuda

onnx_model = onnx.load("model.onnx")

builder = trt.Builder(trt.Logger(trt.Logger.WARNING))
network = builder.create_network(1 << (int)(
    trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))
config = builder.create_builder_config()
config.max_workspace_size = 1 * 1 << 30

parser = trt.OnnxParser(network, trt.Logger(trt.Logger.WARNING))
assert parser.parse(onnx._serialize(onnx_model))
engine = builder.build_engine(network, config)

# Shows logs:
"""
[06/25/2022-23:56:07] [TRT] [W] TensorRT was linked against cuDNN 8.4.1 but loaded cuDNN 8.3.2
[06/25/2022-23:56:10] [TRT] [W] GPU error during getBestTactic: Conv_2 : misaligned address
[06/25/2022-23:56:10] [TRT] [W] GPU error during getBestTactic: Conv_2 : misaligned address
[06/25/2022-23:56:10] [TRT] [W] GPU error during getBestTactic: PWN(LeakyRelu_4) : misaligned address
[06/25/2022-23:56:10] [TRT] [W] GPU error during getBestTactic: PWN(LeakyRelu_4) : misaligned address
[06/25/2022-23:56:10] [TRT] [W] GPU error during getBestTactic: PWN(LeakyRelu_4) : misaligned address
[06/25/2022-23:56:10] [TRT] [W] GPU error during getBestTactic: PWN(LeakyRelu_4) : misaligned address
[06/25/2022-23:56:10] [TRT] [W] GPU error during getBestTactic: PWN(LeakyRelu_4) : misaligned address
[06/25/2022-23:56:10] [TRT] [E] 1: [virtualMemoryBuffer.cpp::~StdVirtualMemoryBufferImpl::104] Error Code 1: Cuda Runtime (misaligned address)
[06/25/2022-23:56:10] [TRT] [E] 10: [optimizer.cpp::computeCosts::3628] Error Code 10: Internal Error (Could not find any implementation for node PWN(LeakyRelu_4).)
"""

stream = cuda.Stream()
"""
LogicError: cuStreamCreate failed: misaligned address
"""

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

  1. validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.

In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

The model is validated with full check.

Hi,

We are unable to reproduce the same error, we could successfully run the above script.

root@13d6c4d442e7:/my_data/files_share/218832# vim test.py
root@13d6c4d442e7:/my_data/files_share/218832# python test.py
test.py:12: DeprecationWarning: Use set_memory_pool_limit instead.
  config.max_workspace_size = 1 * 1 << 30
[06/30/2022-10:01:40] [TRT] [W] onnx2trt_utils.cpp:369: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[06/30/2022-10:01:40] [TRT] [W] Tensor DataType is determined at build time for tensors not marked as input or output.
test.py:16: DeprecationWarning: Use build_serialized_network instead.
  engine = builder.build_engine(network, config)
root@13d6c4d442e7:/my_data/files_share/218832#

Please make sure, you’re using the latest TensorRT version 8.4 GA.

Thank you.

Thanks! I think I am using the latest TensorRT. But it is possibly related to my GPU model or environment. I will try another environment to see if the issue is still there.

Sorry for the late update. It seems this only happens in my test bed environment (3080Ti). We tried another GPU (A6000) and it turns out to be fine. Not sure if it is a hardware-specific bug or it is due to my wrong environment setup.

We used CUDA 12.2 and TensorRT 8.6 and also got a bunch of random CUDA errors on a 4090 when the engine was build (e.g. Skipping tactic 0x7bff86d5f2eadc76 due to exception misaligned address; 4 Skipping tactic 0x0000000000000009 due to exception an illegal memory access was encountered). It turned out that the 4090 seemed to be overclocked making it unstable when loaded. Would be nice if Nvidia could release a tool to check the integrity of the video card under stress.