The same model consumes different sizes of GPU memory in different GPU

pengfeidip · July 5, 2021, 6:53am

Description

Same model same TensorRT with win10.
350MB GPU memory is required with GTX 1060 ,
700MB GPU memory is required with RTX 2070 ,
1GB GPU memory is required with RTX 3060.

That’s my guess, the architecture of GPU has a great impact on the consumption of GPU memory with TensorRT.?

Environment

TensorRT Version: 7.2.3.4
GPU Type: GTX 1060 RTX 2070 RTX 3060
Nvidia Driver Version: 456.71
CUDA Version: 11.1
CUDNN Version: 8.1
Operating System + Version: win10
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

NVES · July 5, 2021, 7:07am

Hi,
Can you try running your model with trtexec command, and share the “”–verbose"" log in case if the issue persist
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec

You can refer below link for all the supported operators list, in case any operator is not supported you need to create a custom plugin to support that operation

github.com

onnx/onnx-tensorrt/blob/main/docs/operators.md

<!--- SPDX-License-Identifier: Apache-2.0 -->

# Supported ONNX Operators

TensorRT 8.4 supports operators up to Opset 17. Latest information of ONNX operators can be found [here](https://github.com/onnx/onnx/blob/master/docs/Operators.md)

TensorRT supports the following ONNX data types: DOUBLE, FLOAT32, FLOAT16, INT8, and BOOL

> Note: There is limited support for INT32, INT64, and DOUBLE types. TensorRT will attempt to cast down INT64 to INT32 and DOUBLE down to FLOAT, clamping values to `+-INT_MAX` or `+-FLT_MAX` if necessary.

See below for the support matrix of ONNX operators in ONNX-TensorRT.

## Operator Support Matrix

| Operator                  | Supported  | Supported Types | Restrictions                                                                                                           |
|---------------------------|------------|-----------------|------------------------------------------------------------------------------------------------------------------------|
| Abs                       | Y          | FP32, FP16, INT32 |
| Acos                      | Y          | FP32, FP16 |
| Acosh                     | Y          | FP32, FP16 |
| Add                       | Y          | FP32, FP16, INT32 |

This file has been truncated. show original

Also, request you to share your model and script if not shared already so that we can help you better.

Thanks!

pengfeidip · July 5, 2021, 8:17am

1060_trtexec.txt (13.1 KB)
2070_trtexec.txt (16.3 KB)

same model , same softeware ,run by trtexec .
414MB GPU memory is required with GTX 1060
788MB GPU memory is required with RTX 2070

pengfeidip · July 5, 2021, 8:18am

@NVES hi, I provide the info

spolisetty · July 6, 2021, 11:54am

@pengfeidip,

It is expected that TensorRT GPU memory utilization is varies on different GPU architectures.
CUDA compute capability will be different for different GPU architectures. Also new arch would support new unit like tensorCore, this allow us to develop kernels that use more memory to speed up your NN.

Thank you.

pengfeidip · July 7, 2021, 1:12am

@spolisetty
If I want less memory consumption ， and low speed is ok, can i control this ？

spolisetty · July 8, 2021, 4:52am

@pengfeidip,

We can restrict/extend memory consumption using trtexec --worskpace flag.

Thank you.

prince.patel.14 · August 5, 2022, 2:50pm

Hi @spolisetty, While inferencing, can we make the workspace size limited? For different GPU architecture, the size loaded for the same architecture engine is different.

spolisetty · August 8, 2022, 4:32am

Hi,

Hope the following samples may help you.

github.com

NVIDIA/TensorRT/blob/b55c4710ce01f076c26710a48879fcb2661be4a9/samples/python/end_to_end_tensorflow_mnist/sample.py#L53


      
              OUTPUT_NAME = "dense_1/Softmax"
          
          

          
def build_engine(model_file):
              # For more information on TRT basics, refer to the introductory samples.
              with trt.Builder(
                  TRT_LOGGER
              ) as builder, builder.create_network() as network, builder.create_builder_config() as config, trt.UffParser() as parser, trt.Runtime(
                  TRT_LOGGER
              ) as runtime:
                  config.max_workspace_size = common.GiB(1)
                  # Parse the Uff Network
                  parser.register_input(ModelData.INPUT_NAME, ModelData.INPUT_SHAPE)
                  parser.register_output(ModelData.OUTPUT_NAME)
                  parser.parse(model_file, network)
                  # Build and return an engine.
                  plan = builder.build_serialized_network(network, config)
                  return runtime.deserialize_cuda_engine(plan)
          
          

          
# Loads a test case into the provided pagelocked_buffer.

github.com

NVIDIA/TensorRT/blob/b55c4710ce01f076c26710a48879fcb2661be4a9/samples/python/common.py#L35


      
          
          
try:
              # Sometimes python does not understand FileNotFoundError
              FileNotFoundError
          except NameError:
              FileNotFoundError = IOError
          
          
EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
          
          

          
def GiB(val):
              return val * 1 << 30
          
          

          
def add_help(description):
              parser = argparse.ArgumentParser(description=description, formatter_class=argparse.ArgumentDefaultsHelpFormatter)
              args, _ = parser.parse_known_args()
          
          

          
def find_sample_data(description="Runs a TensorRT Python sample", subfolder="", find_files=[], err_msg=""):
              """

Thank you.

Topic		Replies	Views
Tensorrt Engine use too much memory TensorRT tensorrt	1	1575	December 13, 2021
Extreme engine building time for certain models on Windows with FP16 TensorRT	6	1193	March 23, 2022
GPU memory leak when using tensorrt with onnx model TensorRT tensorrt	4	1987	January 13, 2021
How does TensorRT use host memory (RAM) at runtime? TensorRT tensorrt , onnx	3	1724	August 3, 2023
Why different input size causes different performance? TensorRT	4	769	October 12, 2021
High RAM consumption with CUDA and TensorRT on Jetson Xavier NX Jetson Xavier NX tensorrt	10	2771	October 18, 2021
TensorRT --- non-int8 fallback when trying to calibrate ONNX model DeepStream SDK tensorrt , deepstream	11	424	July 1, 2024
Peak memory usage during TensorRT execution TensorRT	2	596	October 16, 2024
TensorRT does not see all GPU memory TensorRT	1	988	November 18, 2022
Trtexec failed to create an engine from onnx file with fp16 TensorRT	7	1204	July 8, 2022

The same model consumes different sizes of GPU memory in different GPU

Description

Environment

Related topics