HOW TO CONVERT a large pytorch model to TRT model?

Description

Scenario: currently I had a Pytorch model that model size was quite enormous (the size over 2GB). According to the traditional method, we usually exported to the Onnx model from PyTorch then converting the Onnx model to the TensorRT model.

However, there was a known issue of Onnx model 2GB limitation. Check here

So there was only one way to save an over 2GB onnx model, that is without saving external data, but I have no idea how to deal with converting an onnx model wihtout enternal data to TRT model.

I really want to try if there is any solution to converting a large Pytorch model to a TRT model.

In conclusion, I wonder what is the traditional way to handle a large over 2GB pytorch model to the TensorRT model?

Thank you!!

Best regards,
Chieh

Environment

TensorRT Version: v7.1
GPU Type: Titan V
Nvidia Driver Version: 450.66
Operating System + Version: Linux
Python Version (if applicable): 3.6
Baremetal or Container (if container which image + tag): TensorRT images

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

  1. validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec
In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

Dear @NVES,

Thanks for your suggestions! I have tested more experiments. Let me describe it in detail!

First Experiment: Model size is almost 1.1GB.

In order to test loading an onnx model without external data

1. ONNX Python code:

The architecture of my external data folder is like below:

I can successfully load the model without external data.

filename = 'model.onnx'
model = onnx.load(filename, load_external_data=False)
load_external_data_for_model(model, 'external_data_folderl/')
onnx.checker.check_model(model)

Then I used this model to do inference, although the model folder was only 1.1GB (i.e., the meaning is that the model was not overtaken over 2GB.

2. Use C++ API to generates TRT engine [WORK]:

Also, I can use the C++ API function to generate the TRT model from the onnx model that I even don’t need to provide the external data folder. It could directly load those files from the directory as same as model.onnx during converting onnx model to the TensorRT model.

3. Use TRTEXEC to generates TRT engine [Failure]:

On other hand, I failed to generate the TRT engine by TRTEXEC.

> trtexec --onnx=model.onnx --saveEngine=model_trtexec.trt --explicitBatch --fp16 --workspace=1024 --verbose
&&&& RUNNING TensorRT.trtexec # trtexec --onnx=model.onnx --saveEngine=model_trtexec.trt --explicitBatch --fp16 --workspace=1024 --verbose
[07/06/2021-15:54:07] [I] === Model Options ===
[07/06/2021-15:54:07] [I] Format: ONNX
[07/06/2021-15:54:07] [I] Model: model_final.onnx
[07/06/2021-15:54:07] [I] Output:
[07/06/2021-15:54:07] [I] === Build Options ===
[07/06/2021-15:54:07] [I] Max batch: explicit
[07/06/2021-15:54:07] [I] Workspace: 1024 MB
[07/06/2021-15:54:07] [I] minTiming: 1
[07/06/2021-15:54:07] [I] avgTiming: 8
[07/06/2021-15:54:07] [I] Precision: FP32+FP16
[07/06/2021-15:54:07] [I] Calibration: 
[07/06/2021-15:54:07] [I] Safe mode: Disabled
[07/06/2021-15:54:07] [I] Save engine: model_trtexec.trt
[07/06/2021-15:54:07] [I] Load engine: 
[07/06/2021-15:54:07] [I] Builder Cache: Enabled
[07/06/2021-15:54:07] [I] NVTX verbosity: 0
[07/06/2021-15:54:07] [I] Inputs format: fp32:CHW
[07/06/2021-15:54:07] [I] Outputs format: fp32:CHW
[07/06/2021-15:54:07] [I] Input build shapes: model
[07/06/2021-15:54:07] [I] Input calibration shapes: model
[07/06/2021-15:54:07] [I] === System Options ===
[07/06/2021-15:54:07] [I] Device: 0
[07/06/2021-15:54:07] [I] DLACore: 
[07/06/2021-15:54:07] [I] Plugins:
[07/06/2021-15:54:07] [I] === Inference Options ===
[07/06/2021-15:54:07] [I] Batch: Explicit
[07/06/2021-15:54:07] [I] Input inference shapes: model
[07/06/2021-15:54:07] [I] Iterations: 10
[07/06/2021-15:54:07] [I] Duration: 3s (+ 200ms warm up)
[07/06/2021-15:54:07] [I] Sleep time: 0ms
[07/06/2021-15:54:07] [I] Streams: 1
[07/06/2021-15:54:07] [I] ExposeDMA: Disabled
[07/06/2021-15:54:07] [I] Spin-wait: Disabled
[07/06/2021-15:54:07] [I] Multithreading: Disabled
[07/06/2021-15:54:07] [I] CUDA Graph: Disabled
[07/06/2021-15:54:07] [I] Skip inference: Disabled
[07/06/2021-15:54:07] [I] Inputs:
[07/06/2021-15:54:07] [I] === Reporting Options ===
[07/06/2021-15:54:07] [I] Verbose: Enabled
[07/06/2021-15:54:07] [I] Averages: 10 inferences
[07/06/2021-15:54:07] [I] Percentile: 99
[07/06/2021-15:54:07] [I] Dump output: Disabled
[07/06/2021-15:54:07] [I] Profile: Disabled
[07/06/2021-15:54:07] [I] Export timing to JSON file: 
[07/06/2021-15:54:07] [I] Export output to JSON file: 
[07/06/2021-15:54:07] [I] Export profile to JSON file: 
[07/06/2021-15:54:07] [I] 
[07/06/2021-15:54:07] [V] [TRT] Registered plugin creator - ::BatchTilePlugin_TRT version 1
[07/06/2021-15:54:07] [V] [TRT] Registered plugin creator - ::BatchedNMS_TRT version 1
[07/06/2021-15:54:07] [V] [TRT] Registered plugin creator - ::BatchedNMSDynamic_TRT version 1
[07/06/2021-15:54:07] [V] [TRT] Registered plugin creator - ::CoordConvAC version 1
[07/06/2021-15:54:07] [V] [TRT] Registered plugin creator - ::CropAndResize version 1
[07/06/2021-15:54:07] [V] [TRT] Registered plugin creator - ::DetectionLayer_TRT version 1
[07/06/2021-15:54:07] [V] [TRT] Registered plugin creator - ::FlattenConcat_TRT version 1
[07/06/2021-15:54:07] [V] [TRT] Registered plugin creator - ::GenerateDetection_TRT version 1
[07/06/2021-15:54:07] [V] [TRT] Registered plugin creator - ::GridAnchor_TRT version 1
[07/06/2021-15:54:07] [V] [TRT] Registered plugin creator - ::GridAnchorRect_TRT version 1
[07/06/2021-15:54:07] [V] [TRT] Registered plugin creator - ::InstanceNormalization_TRT version 1
[07/06/2021-15:54:07] [V] [TRT] Registered plugin creator - ::LReLU_TRT version 1
[07/06/2021-15:54:07] [V] [TRT] Registered plugin creator - ::MultilevelCropAndResize_TRT version 1
[07/06/2021-15:54:07] [V] [TRT] Registered plugin creator - ::MultilevelProposeROI_TRT version 1
[07/06/2021-15:54:07] [V] [TRT] Registered plugin creator - ::NMS_TRT version 1
[07/06/2021-15:54:07] [V] [TRT] Registered plugin creator - ::Normalize_TRT version 1
[07/06/2021-15:54:07] [V] [TRT] Registered plugin creator - ::PriorBox_TRT version 1
[07/06/2021-15:54:07] [V] [TRT] Registered plugin creator - ::ProposalLayer_TRT version 1
[07/06/2021-15:54:07] [V] [TRT] Registered plugin creator - ::Proposal version 1
[07/06/2021-15:54:07] [V] [TRT] Registered plugin creator - ::PyramidROIAlign_TRT version 1
[07/06/2021-15:54:07] [V] [TRT] Registered plugin creator - ::Region_TRT version 1
[07/06/2021-15:54:07] [V] [TRT] Registered plugin creator - ::Reorg_TRT version 1
[07/06/2021-15:54:07] [V] [TRT] Registered plugin creator - ::ResizeNearest_TRT version 1
[07/06/2021-15:54:07] [V] [TRT] Registered plugin creator - ::RPROI_TRT version 1
[07/06/2021-15:54:07] [V] [TRT] Registered plugin creator - ::SpecialSlice_TRT version 1
----------------------------------------------------------------
Input filename:   model.onnx
ONNX IR version:  0.0.6
Opset version:    11
Producer name:    pytorch
Producer version: 1.7
Domain:           
Model version:    0
Doc string:       
----------------------------------------------------------------
[07/06/2021-15:54:19] [V] [TRT] Plugin creator already registered - ::BatchTilePlugin_TRT version 1
[07/06/2021-15:54:19] [V] [TRT] Plugin creator already registered - ::BatchedNMS_TRT version 1
[07/06/2021-15:54:19] [V] [TRT] Plugin creator already registered - ::BatchedNMSDynamic_TRT version 1
[07/06/2021-15:54:19] [V] [TRT] Plugin creator already registered - ::CoordConvAC version 1
[07/06/2021-15:54:19] [V] [TRT] Plugin creator already registered - ::CropAndResize version 1
[07/06/2021-15:54:19] [V] [TRT] Plugin creator already registered - ::DetectionLayer_TRT version 1
[07/06/2021-15:54:19] [V] [TRT] Plugin creator already registered - ::FlattenConcat_TRT version 1
[07/06/2021-15:54:19] [V] [TRT] Plugin creator already registered - ::GenerateDetection_TRT version 1
[07/06/2021-15:54:19] [V] [TRT] Plugin creator already registered - ::GridAnchor_TRT version 1
[07/06/2021-15:54:19] [V] [TRT] Plugin creator already registered - ::GridAnchorRect_TRT version 1
[07/06/2021-15:54:19] [V] [TRT] Plugin creator already registered - ::InstanceNormalization_TRT version 1
[07/06/2021-15:54:19] [V] [TRT] Plugin creator already registered - ::LReLU_TRT version 1
[07/06/2021-15:54:19] [V] [TRT] Plugin creator already registered - ::MultilevelCropAndResize_TRT version 1
[07/06/2021-15:54:19] [V] [TRT] Plugin creator already registered - ::MultilevelProposeROI_TRT version 1
[07/06/2021-15:54:19] [V] [TRT] Plugin creator already registered - ::NMS_TRT version 1
[07/06/2021-15:54:19] [V] [TRT] Plugin creator already registered - ::Normalize_TRT version 1
[07/06/2021-15:54:19] [V] [TRT] Plugin creator already registered - ::PriorBox_TRT version 1
[07/06/2021-15:54:19] [V] [TRT] Plugin creator already registered - ::ProposalLayer_TRT version 1
[07/06/2021-15:54:19] [V] [TRT] Plugin creator already registered - ::Proposal version 1
[07/06/2021-15:54:19] [V] [TRT] Plugin creator already registered - ::PyramidROIAlign_TRT version 1
[07/06/2021-15:54:19] [V] [TRT] Plugin creator already registered - ::Region_TRT version 1
[07/06/2021-15:54:19] [V] [TRT] Plugin creator already registered - ::Reorg_TRT version 1
[07/06/2021-15:54:19] [V] [TRT] Plugin creator already registered - ::ResizeNearest_TRT version 1
[07/06/2021-15:54:19] [V] [TRT] Plugin creator already registered - ::RPROI_TRT version 1
[07/06/2021-15:54:19] [V] [TRT] Plugin creator already registered - ::SpecialSlice_TRT version 1
[07/06/2021-15:54:19] [V] [TRT] [TRT]/home/jenkins/workspace/OSS/L0_MergeRequest/oss/parsers/onnx/ModelImporter.cpp:206: Adding network input: x with dtype: float32, dimensions: (1, 3, 224, 224)
[07/06/2021-15:54:19] [V] [TRT] [TRT]/home/jenkins/workspace/OSS/L0_MergeRequest/oss/parsers/onnx/ImporterContext.hpp:120: Registering tensor: x for ONNX tensor: x
[07/06/2021-15:54:19] [V] [TRT] [TRT]/home/jenkins/workspace/OSS/L0_MergeRequest/oss/parsers/onnx/ModelImporter.cpp:90: Importing initializer: 545
[07/06/2021-15:54:19] [V] [TRT] [TRT]/home/jenkins/workspace/OSS/L0_MergeRequest/oss/parsers/onnx/onnx2trt_utils.cpp:1319: Reading weights from external file: 545
corrupted size vs. prev_size
[1]    14712 abort (core dumped)  trtexec --onnx=model.onnx --saveEngine=model_trtexec.trt --explicitBatc

So that is my first experiment.

However, my first experiment was not complete because the model size was only 1.1GB. (Covnert to TRT engine, it only takes 500MB.)

My purpose is to convert the Onnx model which is over 2GB to TRT engine.

Second Experiment: The model size is over 10GB.

Hence, I tried the second experiment. The model size is over 10GB.
2021-07-0608

However, it was failed to generate the TRT engine.
The error message was still caused by the 2GB limitation.

&&&& RUNNING TensorRT.sample_onnx # ./sample_onnx
DataDirs Area: add the data directory.
[07/06/2021-15:44:36] [I] Building and running a GPU inference engine
Model path: /workspace/model_external_10GB.trt
[07/06/2021-15:44:36] [I] 
[07/06/2021-15:44:36] [I] >> Start to generate the model.
----------------------------------------------------------------
Input filename:   /workspace/model/model.onnx
ONNX IR version:  0.0.6
Opset version:    11
Producer name:    pytorch
Producer version: 1.7
Domain:           
Model version:    0
Doc string:       
----------------------------------------------------------------
[07/06/2021-15:45:09] [W] [TRT] [TRT]/home/jenkins/workspace/OSS/L0_MergeRequest/oss/parsers/onnx/onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[07/06/2021-15:45:10] [E] [TRT] Tensor: (Unnamed Layer* 125) [Constant]_output at max batch size of 1 exceeds the maximum element count of 2147483647
[07/06/2021-15:45:10] [E] [TRT] Network validation failed.
&&&& FAILED TensorRT.sample_onnx # ./sample_onnx

I am sorry that it is too hard to provide the model for you because the model size is too huge.
You can simply generate an over 2GB dummy model to test it because the issue seems like TensorRT fundamental limitation mechanism.

I can provide more information if there is any part not clear.
Thank you in advance.

Best regards,
Chieh

@Chieh,

This is expected behavior. TRT doesn’t support any tensor with more than 2147483647 elements.

Thank you.

Hi @spolisetty,

Thank you for your reply!

So does your meaning represent that TensorRT cannot handle the model size which is over 2GB limitation?

Is there any chance to resolve this problem?

Thank you!!

@Chieh,

Yes, I am afraid that we do not have alternative solution.

Thank you.

Hi @spolisetty,

I see!
It is so sad to get this news.
Thanks for your significant information.

BR,
Chieh