Description
I have problems deploying my ONNX model with TensorRT after I trained the model to convert it to ONNX format with Pytorch.
A clear and concise description of the bug or issue.
It is normal to deploy a small number of models at the same time, but when the number of models I deployed at the same time reaches 10 or more, a memory exception will occur during the inference process. The instruction 0x000001FA19A53000 memory is referenced. This memory cannot be read. What I can be sure is that this problem has never occurred on the RTX40 series graphics cards.
Environment
TensorRT Version: 10.9.0.34
GPU Type: RTX 5070
Nvidia Driver Version: GeForce Game Ready 576.52
CUDA Version: CUDA 12.8
CUDNN Version: Cudnn 12x
Operating System + Version: Windows10
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):
Relevant Files
Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)
Steps To Reproduce
Please include:
- Exact steps/commands to build your repro
- Exact steps/commands to run your repro
- Full traceback of errors encountered