Erorr with onnx to trt

Description

A clear and concise description of the bug or issue.

Environment

TensorRT Version: 8.2.1.8
GPU Type: jetson xarive nx 16g
Nvidia Driver Version: jetpack4.6.1
CUDA Version: 10.2
CUDNN Version: I dont kown,its installed by jetpack4.6.1.
Operating System + Version: ubuntu 18.04LTS
Python Version (if applicable): 3.6.9

I use a .py to realize onnx to trt, error:

trt version 8.2.1.8

[03/18/2022-16:54:16] [TRT] [W] onnx2trt_utils.cpp:366: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.

[03/18/2022-16:54:16] [TRT] [W] onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped

[03/18/2022-16:54:18] [TRT] [E] 2: [utils.cpp::checkMemLimit::380] Error Code 2: Internal Error (Assertion upperBound != 0 failed. Unknown embedded device detected. Please update the table with the entry: {{1794, 6, 16}, 12653},)

Traceback (most recent call last):

  File "tools/export_trt.py", line 77, in <module>

    f.write(engine.serialize())

AttributeError: 'NoneType' object has no attribute 'serialize'

.py:

import tensorrt as trt
import sys
import argparse

"""
takes in onnx model
converts to tensorrt
"""

if __name__ == '__main__':

    desc = 'compile Onnx model to TensorRT'
    parser = argparse.ArgumentParser(description=desc)
    parser.add_argument('--model', help='onnx file')
    parser.add_argument('--out', type=str, default='', help='name of trt output file')
    parser.add_argument('--fp', type=int, default=16, help='floating point precision. 16 or 32')
    parser.add_argument('--batch', type=int, default=1)
    parser.add_argument('--verbose', action='store_true')
    parser.add_argument('--jetson_nano', action='store_true')
    opt = parser.parse_args()
    
    batch_size = opt.batch
    model = opt.model
    fp = opt.fp
    output = opt.out if opt.out else opt.model.replace('.onnx', '.trt')
    assert fp in (16, 32)

    if opt.jetson_nano:
        workspace = 1 << 28
    else:
        workspace = 4 * 1 << 30
    
    logger = trt.Logger(trt.Logger.WARNING)
    if opt.verbose:
        logger.min_severity = trt.Logger.VERBOSE
    
    print('trt version', trt.__version__)
    assert trt.__version__[0] >= '7'
    EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)

    with trt.Builder(logger) as builder, builder.create_network(EXPLICIT_BATCH) as network, trt.OnnxParser(network, logger) as parser:
        if trt.__version__[0] == '7':
            builder.max_workspace_size = workspace
            builder.max_batch_size = batch_size
            if fp == 16:
                builder.fp16_mode = True

            with open(model, 'rb') as f:
                if not parser.parse(f.read()):
                    for error in range(parser.num_errors):
                        print('ERROR', parser.get_error(error))
            
            # if your onnx has a dynamic input...
            # network.get_input(0).shape = [1, 3, 352, 608]
            
            engine = builder.build_cuda_engine(network)
            with open(output, 'wb') as f:
                f.write(engine.serialize())
            print('Done')
        else:
            # https://github.com/NVIDIA-AI-IOT/torch2trt/issues/557
            # https://github.com/NVIDIA-AI-IOT/torch2trt/commit/8f742904d603fcde4fe521baa31bdc18002c23cb#diff-f682ce583d8646e112002fb3631f7d205b63aae7b0ca673b020fed7244d4ed38
            
            config = builder.create_builder_config()
            config.max_workspace_size = workspace
            if fp == 16:
                config.set_flag(trt.BuilderFlag.FP16)
            builder.max_batch_size = batch_size
            
            with open(model, 'rb') as f:
                if not parser.parse(f.read()):
                    for error in range(parser.num_errors):
                        print('ERROR', parser.get_error(error))
                        
            engine = builder.build_engine(network, config)
            with open(output, 'wb') as f:
                f.write(engine.serialize())
            print('Done')

it can work in jetson nano with jetpack4.5, should I install jetpack 4.5 to nx?

hello?

Plus: It could not work in trtexec.
jetson nano developkit with jetpack4.6.1 can run it.
I get the verbose:

fwav@ubuntu:/usr/src/tensorrt/bin$ ./trtexec --onnx=/media/fwav/1706-63D2/yolo-fastestv2.onnx  --saveEngine=/media/fwav/1706-63D2/test.trt
&&&& RUNNING TensorRT.trtexec [TensorRT v8201] # ./trtexec --onnx=/media/fwav/1706-63D2/yolo-fastestv2.onnx --saveEngine=/media/fwav/1706-63D2/test.trt
[03/21/2022-15:24:04] [I] === Model Options ===
[03/21/2022-15:24:04] [I] Format: ONNX
[03/21/2022-15:24:04] [I] Model: /media/fwav/1706-63D2/yolo-fastestv2.onnx
[03/21/2022-15:24:04] [I] Output:
[03/21/2022-15:24:04] [I] === Build Options ===
[03/21/2022-15:24:04] [I] Max batch: explicit batch
[03/21/2022-15:24:04] [I] Workspace: 16 MiB
[03/21/2022-15:24:04] [I] minTiming: 1
[03/21/2022-15:24:04] [I] avgTiming: 8
[03/21/2022-15:24:04] [I] Precision: FP32
[03/21/2022-15:24:04] [I] Calibration: 
[03/21/2022-15:24:04] [I] Refit: Disabled
[03/21/2022-15:24:04] [I] Sparsity: Disabled
[03/21/2022-15:24:04] [I] Safe mode: Disabled
[03/21/2022-15:24:04] [I] DirectIO mode: Disabled
[03/21/2022-15:24:04] [I] Restricted mode: Disabled
[03/21/2022-15:24:04] [I] Save engine: /media/fwav/1706-63D2/test.trt
[03/21/2022-15:24:04] [I] Load engine: 
[03/21/2022-15:24:04] [I] Profiling verbosity: 0
[03/21/2022-15:24:04] [I] Tactic sources: Using default tactic sources
[03/21/2022-15:24:04] [I] timingCacheMode: local
[03/21/2022-15:24:04] [I] timingCacheFile: 
[03/21/2022-15:24:04] [I] Input(s)s format: fp32:CHW
[03/21/2022-15:24:04] [I] Output(s)s format: fp32:CHW
[03/21/2022-15:24:04] [I] Input build shapes: model
[03/21/2022-15:24:04] [I] Input calibration shapes: model
[03/21/2022-15:24:04] [I] === System Options ===
[03/21/2022-15:24:04] [I] Device: 0
[03/21/2022-15:24:04] [I] DLACore: 
[03/21/2022-15:24:04] [I] Plugins:
[03/21/2022-15:24:04] [I] === Inference Options ===
[03/21/2022-15:24:04] [I] Batch: Explicit
[03/21/2022-15:24:04] [I] Input inference shapes: model
[03/21/2022-15:24:04] [I] Iterations: 10
[03/21/2022-15:24:04] [I] Duration: 3s (+ 200ms warm up)
[03/21/2022-15:24:04] [I] Sleep time: 0ms
[03/21/2022-15:24:04] [I] Idle time: 0ms
[03/21/2022-15:24:04] [I] Streams: 1
[03/21/2022-15:24:04] [I] ExposeDMA: Disabled
[03/21/2022-15:24:04] [I] Data transfers: Enabled
[03/21/2022-15:24:04] [I] Spin-wait: Disabled
[03/21/2022-15:24:04] [I] Multithreading: Disabled
[03/21/2022-15:24:04] [I] CUDA Graph: Disabled
[03/21/2022-15:24:04] [I] Separate profiling: Disabled
[03/21/2022-15:24:04] [I] Time Deserialize: Disabled
[03/21/2022-15:24:04] [I] Time Refit: Disabled
[03/21/2022-15:24:04] [I] Skip inference: Disabled
[03/21/2022-15:24:04] [I] Inputs:
[03/21/2022-15:24:04] [I] === Reporting Options ===
[03/21/2022-15:24:04] [I] Verbose: Disabled
[03/21/2022-15:24:04] [I] Averages: 10 inferences
[03/21/2022-15:24:04] [I] Percentile: 99
[03/21/2022-15:24:04] [I] Dump refittable layers:Disabled
[03/21/2022-15:24:04] [I] Dump output: Disabled
[03/21/2022-15:24:04] [I] Profile: Disabled
[03/21/2022-15:24:04] [I] Export timing to JSON file: 
[03/21/2022-15:24:04] [I] Export output to JSON file: 
[03/21/2022-15:24:04] [I] Export profile to JSON file: 
[03/21/2022-15:24:04] [I] 
[03/21/2022-15:24:04] [I] === Device Information ===
[03/21/2022-15:24:04] [I] Selected Device: Xavier
[03/21/2022-15:24:04] [I] Compute Capability: 7.2
[03/21/2022-15:24:04] [I] SMs: 6
[03/21/2022-15:24:04] [I] Compute Clock Rate: 1.109 GHz
[03/21/2022-15:24:04] [I] Device Global Memory: 15825 MiB
[03/21/2022-15:24:04] [I] Shared Memory per SM: 96 KiB
[03/21/2022-15:24:04] [I] Memory Bus Width: 256 bits (ECC disabled)
[03/21/2022-15:24:04] [I] Memory Clock Rate: 1.109 GHz
[03/21/2022-15:24:04] [I] 
[03/21/2022-15:24:04] [I] TensorRT version: 8.2.1
[03/21/2022-15:24:06] [I] [TRT] [MemUsageChange] Init CUDA: CPU +362, GPU +0, now: CPU 381, GPU 4022 (MiB)
[03/21/2022-15:24:06] [I] [TRT] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 381 MiB, GPU 4022 MiB
[03/21/2022-15:24:06] [I] [TRT] [MemUsageSnapshot] End constructing builder kernel library: CPU 486 MiB, GPU 4127 MiB
[03/21/2022-15:24:06] [I] Start parsing network model
[03/21/2022-15:24:06] [I] [TRT] ----------------------------------------------------------------
[03/21/2022-15:24:06] [I] [TRT] Input filename:   /media/fwav/1706-63D2/yolo-fastestv2.onnx
[03/21/2022-15:24:06] [I] [TRT] ONNX IR version:  0.0.6
[03/21/2022-15:24:06] [I] [TRT] Opset version:    11
[03/21/2022-15:24:06] [I] [TRT] Producer name:    pytorch
[03/21/2022-15:24:06] [I] [TRT] Producer version: 1.9
[03/21/2022-15:24:06] [I] [TRT] Domain:           
[03/21/2022-15:24:06] [I] [TRT] Model version:    0
[03/21/2022-15:24:06] [I] [TRT] Doc string:       
[03/21/2022-15:24:06] [I] [TRT] ----------------------------------------------------------------
[03/21/2022-15:24:06] [W] [TRT] onnx2trt_utils.cpp:366: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[03/21/2022-15:24:07] [I] Finish parsing network model
[03/21/2022-15:24:07] [I] [TRT] ---------- Layers Running on DLA ----------
[03/21/2022-15:24:07] [I] [TRT] ---------- Layers Running on GPU ----------
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_0 + Relu_1
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] MaxPool_2
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_6 + Relu_7
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_3
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_4
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_8
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_9
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] {ForeignNode[Relu_5...Gather_20]}
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_21 + Relu_22
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_23
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_24
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] {ForeignNode[468...Gather_35]}
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_36 + Relu_37
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_38
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_39
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] {ForeignNode[486...Gather_48]}
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_51 + Relu_52
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_53
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_54 + Relu_55
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] 505 copy
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_60 + Relu_61
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_57
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_58
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_62
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_63
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] {ForeignNode[Relu_59...Gather_74]}
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_75 + Relu_76
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_77
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_78
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] {ForeignNode[536...Gather_89]}
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_90 + Relu_91
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_92
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_93
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] {ForeignNode[554...Gather_104]}
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_105 + Relu_106
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_107
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_108
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] {ForeignNode[572...Gather_119]}
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_120 + Relu_121
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_122
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_123
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] {ForeignNode[590...Gather_134]}
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_135 + Relu_136
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_137
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_138
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] {ForeignNode[608...Gather_149]}
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_150 + Relu_151
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_152
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_153
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] {ForeignNode[626...Gather_162]}
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_165 + Relu_166
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_167
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_168 + Relu_169
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] 645 copy
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_174 + Relu_175
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_171
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_172
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_176
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_177
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] {ForeignNode[Relu_173...Gather_188]}
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_189 + Relu_190
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_191
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_192
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] {ForeignNode[676...Gather_203]}
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_204 + Relu_205
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_206
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_207
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] {ForeignNode[694...Gather_216]}
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_219 + Relu_220
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_221
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_222 + Relu_223
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] 713 copy
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Resize_240
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_225 + Relu_226
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] 752 copy
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_233 + Relu_234
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_227 + Relu_228
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_242 + Relu_243
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_250 + Relu_251
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_244 + Relu_245
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_229
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_230 + Relu_231
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_235
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_236 + Relu_237
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_238
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_259
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] PWN(Sigmoid_267)
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_232
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_261 || Conv_260
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Transpose_269 + (Unnamed Layer* 245) [Shuffle]
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Softmax_270
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] (Unnamed Layer* 247) [Shuffle] + Transpose_271
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] PWN(Sigmoid_268)
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] 789 copy
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Transpose_275
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_246
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_247 + Relu_248
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_252
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_253 + Relu_254
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_255
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_256
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] PWN(Sigmoid_262)
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_249
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Conv_258 || Conv_257
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Transpose_264 + (Unnamed Layer* 238) [Shuffle]
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Softmax_265
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] (Unnamed Layer* 240) [Shuffle] + Transpose_266
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] PWN(Sigmoid_263)
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] 784 copy
[03/21/2022-15:24:07] [I] [TRT] [GpuLayer] Transpose_273
[03/21/2022-15:24:08] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +227, GPU +228, now: CPU 715, GPU 4359 (MiB)
[03/21/2022-15:24:09] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +307, GPU +322, now: CPU 1022, GPU 4681 (MiB)
[03/21/2022-15:24:09] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[03/21/2022-15:24:09] [E] Error[2]: [utils.cpp::checkMemLimit::380] Error Code 2: Internal Error (Assertion upperBound != 0 failed. Unknown embedded device detected. Please update the table with the entry: {{1794, 6, 16}, 12660},)
[03/21/2022-15:24:09] [E] Error[2]: [builder.cpp::buildSerializedNetwork::609] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed. )
[03/21/2022-15:24:09] [E] Engine could not be created from network
[03/21/2022-15:24:09] [E] Building engine failed
[03/21/2022-15:24:09] [E] Failed to create engine from model.
[03/21/2022-15:24:09] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8201] # ./trtexec --onnx=/media/fwav/1706-63D2/yolo-fastestv2.onnx --saveEngine=/media/fwav/1706-63D2/test.trt


We are moving this to the Jetson Xavier forum to get better help.

Thank you.

Duplicate to the following topic:

This topic was initiated by me in tensorRT section, and it was moved over.
I suspect my problem is because I used 16g nx not 8g RAM. I hope you can solve it in the next version of jetpack.

Hi,

Is this the same issue of topic 208933?
If yes, let’s discuss the following on that topic for clarity.

Thanks.

yes

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.