Trtexec Saving engine to file failed

Hi

  • 1 BSP environment:
    16g orin nx jetpack 5.1.1 L4T R35.3.1 kernel 5.10 aarch64
    orin nx develop kit(p3767)
  • 2 operation:
    based on the tensorrt demo.
    run the following command to do gpu loading test.
cd /usr/src/tensorrt/bin
./trtexec --deploy=../data/resnet50/ResNet50_N2.prototxt --model=../data/resnet50/ResNet50_fp32.caffemodel --output=prob --batch=16 --saveEngine=mnist16.trt
  • 3 Problems:
    trtexec Saving engine to file failed.
nvidia@tegra:/usr/src/tensorrt/samples$ /usr/src/tensorrt/bin/trtexec --deploy=../data/resnet50/ResNet50_N2.prototxt --model=../data/resnet50/ResNet50_fp32.caffemodel --output=prob --batch=16 --saveEngine=resnet50.trt
&&&& RUNNING TensorRT.trtexec [TensorRT v8502] # /usr/src/tensorrt/bin/trtexec --deploy=../data/resnet50/ResNet50_N2.prototxt --model=../data/resnet50/ResNet50_fp32.caffemodel --output=prob --batch=16 --saveEngine=resnet50.trt
[05/06/2023-18:05:54] [I] === Model Options ===
[05/06/2023-18:05:54] [I] Format: Caffe
[05/06/2023-18:05:54] [I] Model: ../data/resnet50/ResNet50_fp32.caffemodel
[05/06/2023-18:05:54] [I] Prototxt: ../data/resnet50/ResNet50_N2.prototxt
[05/06/2023-18:05:54] [I] Output: prob
[05/06/2023-18:05:54] [I] === Build Options ===
[05/06/2023-18:05:54] [I] Max batch: 16
[05/06/2023-18:05:54] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[05/06/2023-18:05:54] [I] minTiming: 1
[05/06/2023-18:05:54] [I] avgTiming: 8
[05/06/2023-18:05:54] [I] Precision: FP32
[05/06/2023-18:05:54] [I] LayerPrecisions: 
[05/06/2023-18:05:54] [I] Calibration: 
[05/06/2023-18:05:54] [I] Refit: Disabled
[05/06/2023-18:05:54] [I] Sparsity: Disabled
[05/06/2023-18:05:54] [I] Safe mode: Disabled
[05/06/2023-18:05:54] [I] DirectIO mode: Disabled
[05/06/2023-18:05:54] [I] Restricted mode: Disabled
[05/06/2023-18:05:54] [I] Build only: Disabled
[05/06/2023-18:05:54] [I] Save engine: resnet50.trt
[05/06/2023-18:05:54] [I] Load engine: 
[05/06/2023-18:05:54] [I] Profiling verbosity: 0
[05/06/2023-18:05:54] [I] Tactic sources: Using default tactic sources
[05/06/2023-18:05:54] [I] timingCacheMode: local
[05/06/2023-18:05:54] [I] timingCacheFile: 
[05/06/2023-18:05:54] [I] Heuristic: Disabled
[05/06/2023-18:05:54] [I] Preview Features: Use default preview flags.
[05/06/2023-18:05:54] [I] Input(s)s format: fp32:CHW
[05/06/2023-18:05:54] [I] Output(s)s format: fp32:CHW
[05/06/2023-18:05:54] [I] Input build shapes: model
[05/06/2023-18:05:54] [I] Input calibration shapes: model
[05/06/2023-18:05:54] [I] === System Options ===
[05/06/2023-18:05:54] [I] Device: 0
[05/06/2023-18:05:54] [I] DLACore: 
[05/06/2023-18:05:54] [I] Plugins:
[05/06/2023-18:05:54] [I] === Inference Options ===
[05/06/2023-18:05:54] [I] Batch: 16
[05/06/2023-18:05:54] [I] Input inference shapes: model
[05/06/2023-18:05:54] [I] Iterations: 10
[05/06/2023-18:05:54] [I] Duration: 3s (+ 200ms warm up)
[05/06/2023-18:05:54] [I] Sleep time: 0ms
[05/06/2023-18:05:54] [I] Idle time: 0ms
[05/06/2023-18:05:54] [I] Streams: 1
[05/06/2023-18:05:54] [I] ExposeDMA: Disabled
[05/06/2023-18:05:54] [I] Data transfers: Enabled
[05/06/2023-18:05:54] [I] Spin-wait: Disabled
[05/06/2023-18:05:54] [I] Multithreading: Disabled
[05/06/2023-18:05:54] [I] CUDA Graph: Disabled
[05/06/2023-18:05:54] [I] Separate profiling: Disabled
[05/06/2023-18:05:54] [I] Time Deserialize: Disabled
[05/06/2023-18:05:54] [I] Time Refit: Disabled
[05/06/2023-18:05:54] [I] NVTX verbosity: 0
[05/06/2023-18:05:54] [I] Persistent Cache Ratio: 0
[05/06/2023-18:05:54] [I] Inputs:
[05/06/2023-18:05:54] [I] === Reporting Options ===
[05/06/2023-18:05:54] [I] Verbose: Disabled
[05/06/2023-18:05:54] [I] Averages: 10 inferences
[05/06/2023-18:05:54] [I] Percentiles: 90,95,99
[05/06/2023-18:05:54] [I] Dump refittable layers:Disabled
[05/06/2023-18:05:54] [I] Dump output: Disabled
[05/06/2023-18:05:54] [I] Profile: Disabled
[05/06/2023-18:05:54] [I] Export timing to JSON file: 
[05/06/2023-18:05:54] [I] Export output to JSON file: 
[05/06/2023-18:05:54] [I] Export profile to JSON file: 
[05/06/2023-18:05:54] [I] 
[05/06/2023-18:05:54] [I] === Device Information ===
[05/06/2023-18:05:54] [I] Selected Device: Orin
[05/06/2023-18:05:54] [I] Compute Capability: 8.7
[05/06/2023-18:05:54] [I] SMs: 8
[05/06/2023-18:05:54] [I] Compute Clock Rate: 0.918 GHz
[05/06/2023-18:05:54] [I] Device Global Memory: 15388 MiB
[05/06/2023-18:05:54] [I] Shared Memory per SM: 164 KiB
[05/06/2023-18:05:54] [I] Memory Bus Width: 64 bits (ECC disabled)
[05/06/2023-18:05:54] [I] Memory Clock Rate: 0.918 GHz
[05/06/2023-18:05:54] [I] 
[05/06/2023-18:05:54] [I] TensorRT version: 8.5.2
[05/06/2023-18:05:55] [I] [TRT] [MemUsageChange] Init CUDA: CPU +220, GPU +0, now: CPU 249, GPU 5167 (MiB)
[05/06/2023-18:05:56] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +302, GPU +431, now: CPU 574, GPU 5620 (MiB)
[05/06/2023-18:05:56] [W] [TRT] The implicit batch dimension mode has been deprecated. Please create the network with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag whenever possible.
[05/06/2023-18:05:56] [I] Start parsing network model
[05/06/2023-18:05:56] [I] Finish parsing network model
[05/06/2023-18:05:57] [I] [TRT] ---------- Layers Running on DLA ----------
[05/06/2023-18:05:57] [I] [TRT] ---------- Layers Running on GPU ----------
[05/06/2023-18:05:57] [I] [TRT] [GpuLayer] CONVOLUTION: conv1 + bn_conv1 + scale_conv1 + conv1_relu
[05/06/2023-18:05:57] [I] [TRT] [GpuLayer] POOLING: pool1
[05/06/2023-18:05:57] [I] [TRT] [GpuLayer] CONVOLUTION: res2a_branch2a + bn2a_branch2a + scale2a_branch2a + res2a_branch2a_relu
[05/06/2023-18:05:57] [I] [TRT] [GpuLayer] CONVOLUTION: res2a_branch2b + bn2a_branch2b + scale2a_branch2b + res2a_branch2b_relu
[05/06/2023-18:05:57] [I] [TRT] [GpuLayer] CONVOLUTION: res2a_branch2c + bn2a_branch2c + scale2a_branch2c
[05/06/2023-18:05:57] [I] [TRT] [GpuLayer] CONVOLUTION: res2a_branch1 + bn2a_branch1 + scale2a_branch1 + res2a + res2a_relu
[05/06/2023-18:05:57] [I] [TRT] [GpuLayer] CONVOLUTION: res2b_branch2a + bn2b_branch2a + scale2b_branch2a + res2b_branch2a_relu
[05/06/2023-18:05:57] [I] [TRT] [GpuLayer] CONVOLUTION: res2b_branch2b + bn2b_branch2b + scale2b_branch2b + res2b_branch2b_relu
[05/06/2023-18:05:57] [I] [TRT] [GpuLayer] CONVOLUTION: res2b_branch2c + bn2b_branch2c + scale2b_branch2c + res2b + res2b_relu
[05/06/2023-18:05:57] [I] [TRT] [GpuLayer] CONVOLUTION: res2c_branch2a + bn2c_branch2a + scale2c_branch2a + res2c_branch2a_relu
[05/06/2023-18:05:57] [I] [TRT] [GpuLayer] CONVOLUTION: res2c_branch2b + bn2c_branch2b + scale2c_branch2b + res2c_branch2b_relu
[05/06/2023-18:05:57] [I] [TRT] [GpuLayer] CONVOLUTION: res2c_branch2c + bn2c_branch2c + scale2c_branch2c + res2c + res2c_relu
[05/06/2023-18:05:57] [I] [TRT] [GpuLayer] CONVOLUTION: res3a_branch2a + bn3a_branch2a + scale3a_branch2a + res3a_branch2a_relu
[05/06/2023-18:05:57] [I] [TRT] [GpuLayer] CONVOLUTION: res3a_branch2b + bn3a_branch2b + scale3a_branch2b + res3a_branch2b_relu
[05/06/2023-18:05:57] [I] [TRT] [GpuLayer] CONVOLUTION: res3a_branch2c + bn3a_branch2c + scale3a_branch2c
[05/06/2023-18:05:57] [I] [TRT] [GpuLayer] CONVOLUTION: res3a_branch1 + bn3a_branch1 + scale3a_branch1 + res3a + res3a_relu
[05/06/2023-18:05:57] [I] [TRT] [GpuLayer] CONVOLUTION: res3b_branch2a + bn3b_branch2a + scale3b_branch2a + res3b_branch2a_relu
[05/06/2023-18:05:57] [I] [TRT] [GpuLayer] CONVOLUTION: res3b_branch2b + bn3b_branch2b + scale3b_branch2b + res3b_branch2b_relu
[05/06/2023-18:05:57] [I] [TRT] [GpuLayer] CONVOLUTION: res3b_branch2c + bn3b_branch2c + scale3b_branch2c + res3b + res3b_relu
[05/06/2023-18:05:57] [I] [TRT] [GpuLayer] CONVOLUTION: res3c_branch2a + bn3c_branch2a + scale3c_branch2a + res3c_branch2a_relu
[05/06/2023-18:05:57] [I] [TRT] [GpuLayer] CONVOLUTION: res3c_branch2b + bn3c_branch2b + scale3c_branch2b + res3c_branch2b_relu
[05/06/2023-18:05:57] [I] [TRT] [GpuLayer] CONVOLUTION: res3c_branch2c + bn3c_branch2c + scale3c_branch2c + res3c + res3c_relu
[05/06/2023-18:05:57] [I] [TRT] [GpuLayer] CONVOLUTION: res3d_branch2a + bn3d_branch2a + scale3d_branch2a + res3d_branch2a_relu
[05/06/2023-18:05:57] [I] [TRT] [GpuLayer] CONVOLUTION: res3d_branch2b + bn3d_branch2b + scale3d_branch2b + res3d_branch2b_relu
[05/06/2023-18:05:57] [I] [TRT] [GpuLayer] CONVOLUTION: res3d_branch2c + bn3d_branch2c + scale3d_branch2c + res3d + res3d_relu
[05/06/2023-18:05:57] [I] [TRT] [GpuLayer] CONVOLUTION: res4a_branch2a + bn4a_branch2a + scale4a_branch2a + res4a_branch2a_relu
[05/06/2023-18:05:57] [I] [TRT] [GpuLayer] CONVOLUTION: res4a_branch2b + bn4a_branch2b + scale4a_branch2b + res4a_branch2b_relu
[05/06/2023-18:05:57] [I] [TRT] [GpuLayer] CONVOLUTION: res4a_branch2c + bn4a_branch2c + scale4a_branch2c
[05/06/2023-18:05:57] [I] [TRT] [GpuLayer] CONVOLUTION: res4a_branch1 + bn4a_branch1 + scale4a_branch1 + res4a + res4a_relu
[05/06/2023-18:05:57] [I] [TRT] [GpuLayer] CONVOLUTION: res4b_branch2a + bn4b_branch2a + scale4b_branch2a + res4b_branch2a_relu
[05/06/2023-18:05:57] [I] [TRT] [GpuLayer] CONVOLUTION: res4b_branch2b + bn4b_branch2b + scale4b_branch2b + res4b_branch2b_relu
[05/06/2023-18:05:57] [I] [TRT] [GpuLayer] CONVOLUTION: res4b_branch2c + bn4b_branch2c + scale4b_branch2c + res4b + res4b_relu
[05/06/2023-18:05:57] [I] [TRT] [GpuLayer] CONVOLUTION: res4c_branch2a + bn4c_branch2a + scale4c_branch2a + res4c_branch2a_relu
[05/06/2023-18:05:57] [I] [TRT] [GpuLayer] CONVOLUTION: res4c_branch2b + bn4c_branch2b + scale4c_branch2b + res4c_branch2b_relu
[05/06/2023-18:05:57] [I] [TRT] [GpuLayer] CONVOLUTION: res4c_branch2c + bn4c_branch2c + scale4c_branch2c + res4c + res4c_relu
[05/06/2023-18:05:57] [I] [TRT] [GpuLayer] CONVOLUTION: res4d_branch2a + bn4d_branch2a + scale4d_branch2a + res4d_branch2a_relu
[05/06/2023-18:05:57] [I] [TRT] [GpuLayer] CONVOLUTION: res4d_branch2b + bn4d_branch2b + scale4d_branch2b + res4d_branch2b_relu
[05/06/2023-18:05:57] [I] [TRT] [GpuLayer] CONVOLUTION: res4d_branch2c + bn4d_branch2c + scale4d_branch2c + res4d + res4d_relu
[05/06/2023-18:05:57] [I] [TRT] [GpuLayer] CONVOLUTION: res4e_branch2a + bn4e_branch2a + scale4e_branch2a + res4e_branch2a_relu
[05/06/2023-18:05:57] [I] [TRT] [GpuLayer] CONVOLUTION: res4e_branch2b + bn4e_branch2b + scale4e_branch2b + res4e_branch2b_relu
[05/06/2023-18:05:57] [I] [TRT] [GpuLayer] CONVOLUTION: res4e_branch2c + bn4e_branch2c + scale4e_branch2c + res4e + res4e_relu
[05/06/2023-18:05:57] [I] [TRT] [GpuLayer] CONVOLUTION: res4f_branch2a + bn4f_branch2a + scale4f_branch2a + res4f_branch2a_relu
[05/06/2023-18:05:57] [I] [TRT] [GpuLayer] CONVOLUTION: res4f_branch2b + bn4f_branch2b + scale4f_branch2b + res4f_branch2b_relu
[05/06/2023-18:05:57] [I] [TRT] [GpuLayer] CONVOLUTION: res4f_branch2c + bn4f_branch2c + scale4f_branch2c + res4f + res4f_relu
[05/06/2023-18:05:57] [I] [TRT] [GpuLayer] CONVOLUTION: res5a_branch2a + bn5a_branch2a + scale5a_branch2a + res5a_branch2a_relu
[05/06/2023-18:05:57] [I] [TRT] [GpuLayer] CONVOLUTION: res5a_branch2b + bn5a_branch2b + scale5a_branch2b + res5a_branch2b_relu
[05/06/2023-18:05:57] [I] [TRT] [GpuLayer] CONVOLUTION: res5a_branch2c + bn5a_branch2c + scale5a_branch2c
[05/06/2023-18:05:57] [I] [TRT] [GpuLayer] CONVOLUTION: res5a_branch1 + bn5a_branch1 + scale5a_branch1 + res5a + res5a_relu
[05/06/2023-18:05:57] [I] [TRT] [GpuLayer] CONVOLUTION: res5b_branch2a + bn5b_branch2a + scale5b_branch2a + res5b_branch2a_relu
[05/06/2023-18:05:57] [I] [TRT] [GpuLayer] CONVOLUTION: res5b_branch2b + bn5b_branch2b + scale5b_branch2b + res5b_branch2b_relu
[05/06/2023-18:05:57] [I] [TRT] [GpuLayer] CONVOLUTION: res5b_branch2c + bn5b_branch2c + scale5b_branch2c + res5b + res5b_relu
[05/06/2023-18:05:57] [I] [TRT] [GpuLayer] CONVOLUTION: res5c_branch2a + bn5c_branch2a + scale5c_branch2a + res5c_branch2a_relu
[05/06/2023-18:05:57] [I] [TRT] [GpuLayer] CONVOLUTION: res5c_branch2b + bn5c_branch2b + scale5c_branch2b + res5c_branch2b_relu
[05/06/2023-18:05:57] [I] [TRT] [GpuLayer] CONVOLUTION: res5c_branch2c + bn5c_branch2c + scale5c_branch2c + res5c + res5c_relu
[05/06/2023-18:05:57] [I] [TRT] [GpuLayer] POOLING: pool5
[05/06/2023-18:05:57] [I] [TRT] [GpuLayer] CONVOLUTION: fc1000
[05/06/2023-18:05:57] [I] [TRT] [GpuLayer] SOFTMAX: prob
[05/06/2023-18:05:57] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +534, GPU +778, now: CPU 1297, GPU 6635 (MiB)
[05/06/2023-18:05:58] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +82, GPU +119, now: CPU 1379, GPU 6754 (MiB)
[05/06/2023-18:05:58] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.



[05/06/2023-18:08:11] [I] [TRT] Total Activation Memory: 16839711744
[05/06/2023-18:08:11] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[05/06/2023-18:08:12] [I] [TRT] Total Host Persistent Memory: 177920
[05/06/2023-18:08:12] [I] [TRT] Total Device Persistent Memory: 75776
[05/06/2023-18:08:12] [I] [TRT] Total Scratch Memory: 0
[05/06/2023-18:08:12] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 179 MiB, GPU 4587 MiB
[05/06/2023-18:08:12] [I] [TRT] [BlockAssignment] Started assigning block shifts. This will take 58 steps to complete.
[05/06/2023-18:08:12] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 0.792179ms to assign 3 blocks to 58 nodes requiring 128450560 bytes.
[05/06/2023-18:08:12] [I] [TRT] Total Activation Memory: 128450560
[05/06/2023-18:08:12] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +89, GPU +128, now: CPU 89, GPU 128 (MiB)
[05/06/2023-18:08:12] [E] Saving engine to file failed.
[05/06/2023-18:08:12] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8502] # /usr/src/tensorrt/bin/trtexec --deploy=../data/resnet50/ResNet50_N2.prototxt --model=../data/resnet50/ResNet50_fp32.caffemodel --output=prob --batch=16 --saveEngine=resnet50.trt

Dear @Henry.Lou,
Just want to confirm if /usr/src/tensorrt/bin has write permission?

There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

Is this still an issue to support? Any result can be shared? Thanks