How does TensorRT use host memory (RAM) at runtime?

Description

According to the TensorRT documentation, you can expect high host memory (RAM) usage during the build phase. Then lower host memory usage during runtime. This is what’s I’d expect, as inference should mostly use device (GPU) memory. This is also corroborated here, which implies I can expect a fixed amount of host memory usage at runtime, the variable amount is in the build stage.

However this is not something I’ve experienced using the TensorRT library. Our system uses relatively large amounts (~4GB) of host memory during runtime. I’ve also been able to replicate this using trtexec. We can actually observe a slight increase in host memory usage, after the build stage has finished using trtexec.

Here’s the trtexec example, you can use any onnx model:

/usr/src/tensorrt/bin/trtexec --useSpinWait --fp16 --timingCacheFile=/home/user/.cache --onnx=/home/user/model.onnx --duration=20

I’ve also tried deserializing the runtime from an .engine plan file instead, to see if it was a result of the ONNX building stage. But it still uses the same amount of host memory during runtime.

This is running on a dGPU, not jetson, so the host/device memory are separate. We are running a large/complex model with a large workspace, in case this is a factor on runtime host memory usage.

Questions

  • Is it expected that large amounts of host memory can be consumed, whilst running a model? (after it has been built).
  • What are the factors that increase the host memory consumption at runtime, is it the same as the build stage (i.e. model complexity/workspace size/etc)?
  • Why is this necessary to use large amount of host memory after the model is built?

Environment

TensorRT Version: 8.0.0
GPU Type: NVIDIA GeForce RTX 3070 Laptop GPU
Operating System + Version: Ubuntu 20.04

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

  1. validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.

In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

This has been improved in TRT8.3, which now gives more control over how memory is allocated using setMemoryPoolLimit()

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.