LLVM ERROR : out of memory(onnx to tensorRT engine)

dkdlatjsh · May 16, 2024, 6:14am

Description

I tried to convert the solider-reid model into tensorrt engine by simplifying it and extracting it as onnx.
But I got un error message
LLVM ERROR : out of memory error
I don’t know what’s causing it.

Environment

TensorRT Version: 10.0.0.6
GPU Type: NVIDIA RTX A6000
Nvidia Driver Version: 551.78
CUDA Version: 12.4
CUDNN Version: 8.9.7
Operating System + Version: Win 11

Relevant Files

onnx file :https://drive.google.com/file/d/1zGlyYzrtj85fdBNhndFoFambuRcqhR67/view?usp=drive_link

Steps To Reproduce

trtexec --onnx=solider-reid-swin_t_sim.onnx --saveEngine=solider-reid-swin_t_sim.engine --noTF32

I’ll show you the results of my execution below

J:\APPL\CUDA-12.4\TensorRT-10.0.0.6\bin>trtexec --onnx=solider-reid-swin_t_sim.onnx --saveEngine=solider-reid-swin_t_sim.engine --noTF32
&&&& RUNNING TensorRT.trtexec [TensorRT v100000] # trtexec --onnx=solider-reid-swin_t_sim.onnx --saveEngine=solider-reid-swin_t_sim.engine --noTF32
[05/16/2024-09:16:04] [I] === Model Options ===
[05/16/2024-09:16:04] [I] Format: ONNX
[05/16/2024-09:16:04] [I] Model: solider-reid-swin_t_sim.onnx
[05/16/2024-09:16:04] [I] Output:
[05/16/2024-09:16:04] [I] === Build Options ===
[05/16/2024-09:16:04] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default, tacticSharedMem: default
[05/16/2024-09:16:04] [I] avgTiming: 8
[05/16/2024-09:16:04] [I] Precision: FP32
[05/16/2024-09:16:04] [I] LayerPrecisions:
[05/16/2024-09:16:04] [I] Layer Device Types:
[05/16/2024-09:16:04] [I] Calibration:
[05/16/2024-09:16:04] [I] Refit: Disabled
[05/16/2024-09:16:04] [I] Strip weights: Disabled
[05/16/2024-09:16:04] [I] Version Compatible: Disabled
[05/16/2024-09:16:04] [I] ONNX Plugin InstanceNorm: Disabled
[05/16/2024-09:16:04] [I] TensorRT runtime: full
[05/16/2024-09:16:04] [I] Lean DLL Path:
[05/16/2024-09:16:04] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[05/16/2024-09:16:04] [I] Exclude Lean Runtime: Disabled
[05/16/2024-09:16:04] [I] Sparsity: Disabled
[05/16/2024-09:16:04] [I] Safe mode: Disabled
[05/16/2024-09:16:04] [I] Build DLA standalone loadable: Disabled
[05/16/2024-09:16:04] [I] Allow GPU fallback for DLA: Disabled
[05/16/2024-09:16:04] [I] DirectIO mode: Disabled
[05/16/2024-09:16:04] [I] Restricted mode: Disabled
[05/16/2024-09:16:04] [I] Skip inference: Disabled
[05/16/2024-09:16:04] [I] Save engine: solider-reid-swin_t_sim.engine
[05/16/2024-09:16:04] [I] Load engine:
[05/16/2024-09:16:04] [I] Profiling verbosity: 0
[05/16/2024-09:16:04] [I] Tactic sources: Using default tactic sources
[05/16/2024-09:16:04] [I] timingCacheMode: local
[05/16/2024-09:16:04] [I] timingCacheFile:
[05/16/2024-09:16:04] [I] Enable Compilation Cache: Enabled
[05/16/2024-09:16:04] [I] errorOnTimingCacheMiss: Disabled
[05/16/2024-09:16:04] [I] Preview Features: Use default preview flags.
[05/16/2024-09:16:04] [I] MaxAuxStreams: -1
[05/16/2024-09:16:04] [I] BuilderOptimizationLevel: -1
[05/16/2024-09:16:04] [I] Calibration Profile Index: 0
[05/16/2024-09:16:04] [I] Weight Streaming: Disabled
[05/16/2024-09:16:04] [I] Debug Tensors:
[05/16/2024-09:16:04] [I] Input(s)s format: fp32:CHW
[05/16/2024-09:16:04] [I] Output(s)s format: fp32:CHW
[05/16/2024-09:16:04] [I] Input build shapes: model
[05/16/2024-09:16:04] [I] Input calibration shapes: model
[05/16/2024-09:16:04] [I] === System Options ===
[05/16/2024-09:16:04] [I] Device: 0
[05/16/2024-09:16:04] [I] DLACore:
[05/16/2024-09:16:04] [I] Plugins:
[05/16/2024-09:16:04] [I] setPluginsToSerialize:
[05/16/2024-09:16:04] [I] dynamicPlugins:
[05/16/2024-09:16:04] [I] ignoreParsedPluginLibs: 0
[05/16/2024-09:16:04] [I]
[05/16/2024-09:16:04] [I] === Inference Options ===
[05/16/2024-09:16:04] [I] Batch: Explicit
[05/16/2024-09:16:04] [I] Input inference shapes: model
[05/16/2024-09:16:04] [I] Iterations: 10
[05/16/2024-09:16:04] [I] Duration: 3s (+ 200ms warm up)
[05/16/2024-09:16:04] [I] Sleep time: 0ms
[05/16/2024-09:16:04] [I] Idle time: 0ms
[05/16/2024-09:16:04] [I] Inference Streams: 1
[05/16/2024-09:16:04] [I] ExposeDMA: Disabled
[05/16/2024-09:16:04] [I] Data transfers: Enabled
[05/16/2024-09:16:04] [I] Spin-wait: Disabled
[05/16/2024-09:16:04] [I] Multithreading: Disabled
[05/16/2024-09:16:04] [I] CUDA Graph: Disabled
[05/16/2024-09:16:04] [I] Separate profiling: Disabled
[05/16/2024-09:16:04] [I] Time Deserialize: Disabled
[05/16/2024-09:16:04] [I] Time Refit: Disabled
[05/16/2024-09:16:04] [I] NVTX verbosity: 0
[05/16/2024-09:16:04] [I] Persistent Cache Ratio: 0
[05/16/2024-09:16:04] [I] Optimization Profile Index: 0
[05/16/2024-09:16:04] [I] Weight Streaming Budget: -1 bytes
[05/16/2024-09:16:04] [I] Inputs:
[05/16/2024-09:16:04] [I] Debug Tensor Save Destinations:
[05/16/2024-09:16:04] [I] === Reporting Options ===
[05/16/2024-09:16:04] [I] Verbose: Disabled
[05/16/2024-09:16:04] [I] Averages: 10 inferences
[05/16/2024-09:16:04] [I] Percentiles: 90,95,99
[05/16/2024-09:16:04] [I] Dump refittable layers:Disabled
[05/16/2024-09:16:04] [I] Dump output: Disabled
[05/16/2024-09:16:04] [I] Profile: Disabled
[05/16/2024-09:16:04] [I] Export timing to JSON file:
[05/16/2024-09:16:04] [I] Export output to JSON file:
[05/16/2024-09:16:04] [I] Export profile to JSON file:
[05/16/2024-09:16:04] [I]
[05/16/2024-09:16:04] [I] === Device Information ===
[05/16/2024-09:16:04] [I] Available Devices:
[05/16/2024-09:16:04] [I] Device 0: “NVIDIA RTX A6000” UUID: GPU-c959833a-69aa-83ed-1534-2814aebba435
[05/16/2024-09:16:04] [I] Selected Device: NVIDIA RTX A6000
[05/16/2024-09:16:04] [I] Selected Device ID: 0
[05/16/2024-09:16:04] [I] Selected Device UUID: GPU-c959833a-69aa-83ed-1534-2814aebba435
[05/16/2024-09:16:04] [I] Compute Capability: 8.6
[05/16/2024-09:16:04] [I] SMs: 84
[05/16/2024-09:16:04] [I] Device Global Memory: 49139 MiB
[05/16/2024-09:16:04] [I] Shared Memory per SM: 100 KiB
[05/16/2024-09:16:04] [I] Memory Bus Width: 384 bits (ECC disabled)
[05/16/2024-09:16:04] [I] Application Compute Clock Rate: 1.8 GHz
[05/16/2024-09:16:04] [I] Application Memory Clock Rate: 8.001 GHz
[05/16/2024-09:16:04] [I]
[05/16/2024-09:16:04] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[05/16/2024-09:16:04] [I]
[05/16/2024-09:16:04] [I] TensorRT version: 10.0.0
[05/16/2024-09:16:04] [I] Loading standard plugins
[05/16/2024-09:16:05] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 11835, GPU 1594 (MiB)
[05/16/2024-09:16:19] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +2643, GPU +310, now: CPU 14760, GPU 1904 (MiB)
[05/16/2024-09:16:19] [I] Start parsing network model.
[05/16/2024-09:16:20] [I] [TRT] ----------------------------------------------------------------
[05/16/2024-09:16:20] [I] [TRT] Input filename: solider-reid-swin_t_sim.onnx
[05/16/2024-09:16:20] [I] [TRT] ONNX IR version: 0.0.8
[05/16/2024-09:16:20] [I] [TRT] Opset version: 17
[05/16/2024-09:16:20] [I] [TRT] Producer name: pytorch
[05/16/2024-09:16:20] [I] [TRT] Producer version: 2.3.0
[05/16/2024-09:16:20] [I] [TRT] Domain:
[05/16/2024-09:16:20] [I] [TRT] Model version: 0
[05/16/2024-09:16:20] [I] [TRT] Doc string:
[05/16/2024-09:16:20] [I] [TRT] ----------------------------------------------------------------
[05/16/2024-09:16:20] [I] Finished parsing network model. Parse time: 1.07807
[05/16/2024-09:16:20] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
LLVM ERROR: out of memory

AakankshaS · May 20, 2024, 7:33am

Hi @dkdlatjsh ,
Can you check if htop is showing high memory usage or nvidia-smi is showing high memory usage?

nikita17 · May 20, 2024, 7:42am

Hi @AakankshaS ,
I have a question. Can I access riva-asr on Geforce RTX 3080? I am stuck with this from last two days?

dkdlatjsh · May 20, 2024, 8:30am

No. nvidia-smi is showing 573MiB / 49140 MiB (Memory-Usage)

dkdlatjsh · May 21, 2024, 2:35am

However, the highest RAM memory usage was 53/64GB, so I tried 128GB RAM memory on other PCs, but the same error occurred

AakankshaS · May 30, 2024, 8:09pm

Hi @dkdlatjsh ,
Checking on thsi with the Engineering team.

AakankshaS · May 30, 2024, 8:10pm

Hi @nikita17 , you can raise the concern on riva platform to get better assistance.
However please check
https://docs.nvidia.com/deeplearning/riva/user-guide/docs/support-matrix.html

Topic		Replies	Views
TensorRT 10.8 on Windows: API Usage Error (Target GPU SM 120 is not supported by this TensorRT release.) TensorRT cudnn	3	416	March 27, 2025
LSTM ONNX to TensorRT mismatched outputs TensorRT tensorrt	3	969	September 29, 2022
Tensorrt fails shapeMachine.cpp TensorRT tensorrt , cudnn	2	409	February 16, 2024
TensorRT does not see all GPU memory TensorRT	1	1010	November 18, 2022
I do not get any performance improvement after using TensorRT provider for object detection model Jetson Nano tensorrt , onnx	7	1416	July 12, 2022
Using Custom action recognition Model in Deepstream 3D action recognition and Getting Error TAO Toolkit	70	931	December 12, 2023
Error loading .trt model Jetson AGX Orin tensorrt	7	159	November 6, 2024
Process killed during tensorrt conversion on Jetson orin NX (8 GB) Jetson Orin NX tensorrt	15	744	April 30, 2024
[graphOptimizer.cpp::fusePattern] (!never(dim == ShapeContext::one()) \|\| !never(dim == squeezeSuccessorsOutputDims[i]) failed. ) TensorRT	0	19	November 19, 2024
Use trtexec to run LSTM int8 calibrator failed with Error Code 2: Internal Error (Assertion mIndex >= 0 failed. symbol is not concrete) TensorRT	1	407	November 15, 2023

LLVM ERROR : out of memory(onnx to tensorRT engine)

Description

Environment

Relevant Files

Steps To Reproduce

Related topics