How does TensorRT use host memory (RAM) at runtime?

magnusm · May 22, 2023, 11:08am

Description

According to the TensorRT documentation, you can expect high host memory (RAM) usage during the build phase. Then lower host memory usage during runtime. This is what’s I’d expect, as inference should mostly use device (GPU) memory. This is also corroborated here, which implies I can expect a fixed amount of host memory usage at runtime, the variable amount is in the build stage.

However this is not something I’ve experienced using the TensorRT library. Our system uses relatively large amounts (~4GB) of host memory during runtime. I’ve also been able to replicate this using trtexec. We can actually observe a slight increase in host memory usage, after the build stage has finished using trtexec.

Here’s the trtexec example, you can use any onnx model:

/usr/src/tensorrt/bin/trtexec --useSpinWait --fp16 --timingCacheFile=/home/user/.cache --onnx=/home/user/model.onnx --duration=20

I’ve also tried deserializing the runtime from an .engine plan file instead, to see if it was a result of the ONNX building stage. But it still uses the same amount of host memory during runtime.

This is running on a dGPU, not jetson, so the host/device memory are separate. We are running a large/complex model with a large workspace, in case this is a factor on runtime host memory usage.

Questions

Is it expected that large amounts of host memory can be consumed, whilst running a model? (after it has been built).
What are the factors that increase the host memory consumption at runtime, is it the same as the build stage (i.e. model complexity/workspace size/etc)?
Why is this necessary to use large amount of host memory after the model is built?

Environment

TensorRT Version: 8.0.0
GPU Type: NVIDIA GeForce RTX 3070 Laptop GPU
Operating System + Version: Ubuntu 20.04

AakankshaS · May 29, 2023, 5:33am

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.

In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

magnusm · August 3, 2023, 9:37am

This has been improved in TRT8.3, which now gives more control over how memory is allocated using setMemoryPoolLimit()

system · August 17, 2023, 9:38am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
TensorRT Inference Consuming Large Amount of System Resources TensorRT	1	590	July 5, 2022
High RAM consumption with CUDA and TensorRT on Jetson Xavier NX Jetson Xavier NX tensorrt	10	2809	October 18, 2021
Tensorrt Engine use too much memory TensorRT tensorrt	1	1586	December 13, 2021
The same model consumes different sizes of GPU memory in different GPU TensorRT	8	1707	August 8, 2022
ONNX Model Int64 Weights TensorRT	12	13112	February 17, 2024
TensorRT engine context use mem TensorRT tensorrt	5	1180	July 5, 2022
Excessive RAM usage Jetson Xavier NX pytorch , docker-machine-learning	4	847	February 12, 2024
TensorRT --- non-int8 fallback when trying to calibrate ONNX model DeepStream SDK tensorrt , deepstream	11	428	July 1, 2024
GPU memory leak when using tensorrt with onnx model TensorRT tensorrt	4	2006	January 13, 2021
Inference result gets worse when converting pytorch model to TensorRT model TensorRT pytorch	6	1125	January 19, 2022

How does TensorRT use host memory (RAM) at runtime?

Description

Questions

Environment

check_model.py

Related topics