Memory problems arise when RTX50 series graphics cards are deployed at the same time

1806869020 · July 21, 2025, 3:56am

Description

I have problems deploying my ONNX model with TensorRT after I trained the model to convert it to ONNX format with Pytorch.
A clear and concise description of the bug or issue.
It is normal to deploy a small number of models at the same time, but when the number of models I deployed at the same time reaches 10 or more, a memory exception will occur during the inference process. The instruction 0x000001FA19A53000 memory is referenced. This memory cannot be read. What I can be sure is that this problem has never occurred on the RTX40 series graphics cards.

Environment

TensorRT Version: 10.9.0.34
GPU Type: RTX 5070
Nvidia Driver Version: GeForce Game Ready 576.52
CUDA Version: CUDA 12.8
CUDNN Version: Cudnn 12x
Operating System + Version: Windows10
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

Exact steps/commands to build your repro
Exact steps/commands to run your repro
Full traceback of errors encountered

Topic		Replies	Views
TensorRT does not see all GPU memory TensorRT	1	1026	November 18, 2022
New TensorRT Model occupying more GPU Memory as compared to older version TensorRT tensorrt , tensorflow , gpu	8	1993	August 20, 2021
LLVM ERROR : out of memory(onnx to tensorRT engine) TensorRT tensorrt , onnx	6	1070	May 30, 2024
Cuda OutOfMemory when creating tensor with 2^29 (~0.5 G) elements TensorRT tensorrt , cuda , onnx	6	1760	March 9, 2022
Inference with 5D input data TensorRT tensorrt	3	686	August 10, 2020
GPU memory leak when using tensorrt with onnx model TensorRT tensorrt	4	2056	January 13, 2021
Cuda Error in executeInternal: 700 (an illegal memory access was encountered) Jetson AGX Xavier tensorrt	10	6147	December 2, 2021
Can't perform inference using Python API of TensorRT TensorRT inference-fil-spark	1	614	November 20, 2020
Can huge model with huge total weights size be infered by tensorrt at A10? TensorRT	10	732	November 29, 2022
Using ONNX Runtime with TensorRT on Jetson Devices Jetson AGX Xavier tensorrt	5	1125	October 18, 2021

Memory problems arise when RTX50 series graphics cards are deployed at the same time

Description

Environment

Relevant Files

Steps To Reproduce

Related topics