Description
I have a 3 layer conventional neural network trained in Keras which takes in a [1,46] input and outputs 4 different classes at the end.
This model was converted to ONNX using TF2ONNX.
I performed a conversion of a ONNX model to a tensorRT engine using TRTexec on the Jetson Xavier using jetpack 4.6 with this exact command
trtexec --onnx=nn_embedded.onnx --saveEngine=nn_embedded_trt
When performing inference, I am using the infer() and load_engine() functions from here
This is when I run into this error code
[03/29/2022-15:23:54] [TRT] [E] 3: Cannot find binding of given name: input
[03/29/2022-15:23:54] [TRT] [E] 3: [executionContext.cpp::setBindingDimensions::925] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setBindingDimensions::925, condition: mEngine.bindingIndexBelongsToProfile( bindingIndex, mOptimizationProfile, “IExecutionContext::setBindingDimensions”)
I would like to check what is the meaning of this error code and how to resolve this issue?
From similar topics on the forum, it seems that the error on my end could have occurred when I did the conversion from the ONNX model to the TensorRT engine.
Environment
Jetson Jetpack 4.6 on Jetson Xavier NX
Relevant Files
Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)
Steps To Reproduce
Please include:
Exact steps/commands to build your repro
Exact steps/commands to run your repro
Full traceback of errors encountered
Hi,
Please refer following similar issue.
opened 07:44PM - 02 Jul 21 UTC
closed 01:11AM - 01 Mar 22 UTC
Samples
triaged
Release: 8.x
## Description
I'm trying to run benchmarking using TensorRT 8.0.1 using `trtex… ec`, and I receive the following error when setting more than one stream.
Command:
`trtexec --loadEngine=model-fp32.engine --shapes=input_tensor:0:1x300x300x3 --streams=2`
Error:
`Error[3]: [executionContext.cpp::setBindingDimensions::949] Error Code 3: Internal Error (Parameter check failed at: runtime/api/executionContext.cpp::setBindingDimensions::949, condition: mOptimizationProfile >= 0 && mOptimizationProfile < mEngine.getNbOptimizationProfiles()`
I can email the model files if needed.
## Environment
**TensorRT Version**: 8.0.1-1+cuda11.3
**NVIDIA GPU**: NVIDIA T4
**NVIDIA Driver Version**: 450.80.02
**CUDA Version**: 11.3
**CUDNN Version**:
**Operating System**: Ubuntu 20.04
**Python Version (if applicable)**:
**Tensorflow Version (if applicable)**:
**PyTorch Version (if applicable)**:
**Baremetal or Container (if so, version)**: nvcr.io/nvidia/tensorrt:21.06-py3
## Steps To Reproduce
The pipeline involves converting from ONNX->TRT and the benchmarking the engine file.
**Step 1: Run Docker**
`docker run --rm -it nvcr.io/nvidia/tensorrt:21.06-py3`
**Step 2: Upgrade to TensorRT 8.0.1**
A. Download from https://developer.nvidia.com/nvidia-tensorrt-8x-download
B. Run installation
```
dpkg -i nv-tensorrt-repo-ubuntu2004-cuda11.3-trt8.0.1.6-ga-20210626_1-1_amd64.deb
apt-get update
apt-get install tensorrt libcudnn8
```
**Step 3: Convert ONNX model to TensorRT in 8.0.1**
From ONNX->TRT:
```
trtexec --onnx=model.onnx --saveEngine=model-fp32.engine \
--workspace=4096 \
--minShapes=input_tensor:0:1x300x300x3 \
--maxShapes=input_tensor:0:32x300x300x3 \
--optShapes=input_tensor:0:8x300x300x3 \
--buildOnly
```
**Step 4: Run Benchmarking**
`trtexec --loadEngine=model-fp32.engine --shapes=input_tensor:0:1x300x300x3 --streams=2 --verbose`
Output:
```
&&&& RUNNING TensorRT.trtexec [TensorRT v8001] # trtexec --loadEngine=model-fp32.engine --shapes=input_tensor:0:1x300x300x3 --streams=2 --verbose
[07/02/2021-15:05:16] [I] === Model Options ===
[07/02/2021-15:05:16] [I] Format: *
[07/02/2021-15:05:16] [I] Model:
[07/02/2021-15:05:16] [I] Output:
[07/02/2021-15:05:16] [I] === Build Options ===
[07/02/2021-15:05:16] [I] Max batch: explicit
[07/02/2021-15:05:16] [I] Workspace: 16 MiB
[07/02/2021-15:05:16] [I] minTiming: 1
[07/02/2021-15:05:16] [I] avgTiming: 8
[07/02/2021-15:05:16] [I] Precision: FP32
[07/02/2021-15:05:16] [I] Calibration:
[07/02/2021-15:05:16] [I] Refit: Disabled
[07/02/2021-15:05:16] [I] Sparsity: Disabled
[07/02/2021-15:05:16] [I] Safe mode: Disabled
[07/02/2021-15:05:16] [I] Restricted mode: Disabled
[07/02/2021-15:05:16] [I] Save engine:
[07/02/2021-15:05:16] [I] Load engine: model-fp32.engine
[07/02/2021-15:05:16] [I] NVTX verbosity: 0
[07/02/2021-15:05:16] [I] Tactic sources: Using default tactic sources
[07/02/2021-15:05:16] [I] timingCacheMode: local
[07/02/2021-15:05:16] [I] timingCacheFile:
[07/02/2021-15:05:16] [I] Input(s)s format: fp32:CHW
[07/02/2021-15:05:16] [I] Output(s)s format: fp32:CHW
[07/02/2021-15:05:16] [I] Input build shape: input_tensor:0=1x300x300x3+1x300x300x3+1x300x300x3
[07/02/2021-15:05:16] [I] Input calibration shapes: model
[07/02/2021-15:05:16] [I] === System Options ===
[07/02/2021-15:05:16] [I] Device: 0
[07/02/2021-15:05:16] [I] DLACore:
[07/02/2021-15:05:16] [I] Plugins:
[07/02/2021-15:05:16] [I] === Inference Options ===
[07/02/2021-15:05:16] [I] Batch: Explicit
[07/02/2021-15:05:16] [I] Input inference shape: input_tensor:0=1x300x300x3
[07/02/2021-15:05:16] [I] Iterations: 10
[07/02/2021-15:05:16] [I] Duration: 3s (+ 200ms warm up)
[07/02/2021-15:05:16] [I] Sleep time: 0ms
[07/02/2021-15:05:16] [I] Streams: 2
[07/02/2021-15:05:16] [I] ExposeDMA: Disabled
[07/02/2021-15:05:16] [I] Data transfers: Enabled
[07/02/2021-15:05:16] [I] Spin-wait: Disabled
[07/02/2021-15:05:16] [I] Multithreading: Disabled
[07/02/2021-15:05:16] [I] CUDA Graph: Disabled
[07/02/2021-15:05:16] [I] Separate profiling: Disabled
[07/02/2021-15:05:16] [I] Time Deserialize: Disabled
[07/02/2021-15:05:16] [I] Time Refit: Disabled
[07/02/2021-15:05:16] [I] Skip inference: Disabled
[07/02/2021-15:05:16] [I] Inputs:
[07/02/2021-15:05:16] [I] === Reporting Options ===
[07/02/2021-15:05:16] [I] Verbose: Enabled
[07/02/2021-15:05:16] [I] Averages: 10 inferences
[07/02/2021-15:05:16] [I] Percentile: 99
[07/02/2021-15:05:16] [I] Dump refittable layers:Disabled
[07/02/2021-15:05:16] [I] Dump output: Disabled
[07/02/2021-15:05:16] [I] Profile: Disabled
[07/02/2021-15:05:16] [I] Export timing to JSON file:
[07/02/2021-15:05:16] [I] Export output to JSON file:
[07/02/2021-15:05:16] [I] Export profile to JSON file:
[07/02/2021-15:05:16] [I]
[07/02/2021-15:05:16] [I] === Device Information ===
[07/02/2021-15:05:16] [I] Selected Device: Tesla T4
[07/02/2021-15:05:16] [I] Compute Capability: 7.5
[07/02/2021-15:05:16] [I] SMs: 40
[07/02/2021-15:05:16] [I] Compute Clock Rate: 1.59 GHz
[07/02/2021-15:05:16] [I] Device Global Memory: 15109 MiB
[07/02/2021-15:05:16] [I] Shared Memory per SM: 64 KiB
[07/02/2021-15:05:16] [I] Memory Bus Width: 256 bits (ECC enabled)
[07/02/2021-15:05:16] [I] Memory Clock Rate: 5.001 GHz
[07/02/2021-15:05:16] [I]
[07/02/2021-15:05:16] [I] TensorRT version: 8001
[07/02/2021-15:05:16] [V] [TRT] Registered plugin creator - ::GridAnchor_TRT version 1
[07/02/2021-15:05:16] [V] [TRT] Registered plugin creator - ::GridAnchorRect_TRT version 1
[07/02/2021-15:05:16] [V] [TRT] Registered plugin creator - ::NMS_TRT version 1
[07/02/2021-15:05:16] [V] [TRT] Registered plugin creator - ::Reorg_TRT version 1
[07/02/2021-15:05:16] [V] [TRT] Registered plugin creator - ::Region_TRT version 1
[07/02/2021-15:05:16] [V] [TRT] Registered plugin creator - ::Clip_TRT version 1
[07/02/2021-15:05:16] [V] [TRT] Registered plugin creator - ::LReLU_TRT version 1
[07/02/2021-15:05:16] [V] [TRT] Registered plugin creator - ::PriorBox_TRT version 1
[07/02/2021-15:05:16] [V] [TRT] Registered plugin creator - ::Normalize_TRT version 1
[07/02/2021-15:05:16] [V] [TRT] Registered plugin creator - ::ScatterND version 1
[07/02/2021-15:05:16] [V] [TRT] Registered plugin creator - ::RPROI_TRT version 1
[07/02/2021-15:05:16] [V] [TRT] Registered plugin creator - ::BatchedNMS_TRT version 1
[07/02/2021-15:05:16] [V] [TRT] Registered plugin creator - ::BatchedNMSDynamic_TRT version 1
[07/02/2021-15:05:16] [V] [TRT] Registered plugin creator - ::FlattenConcat_TRT version 1
[07/02/2021-15:05:16] [V] [TRT] Registered plugin creator - ::CropAndResize version 1
[07/02/2021-15:05:16] [V] [TRT] Registered plugin creator - ::DetectionLayer_TRT version 1
[07/02/2021-15:05:16] [V] [TRT] Registered plugin creator - ::EfficientNMS_ONNX_TRT version 1
[07/02/2021-15:05:16] [V] [TRT] Registered plugin creator - ::EfficientNMS_TRT version 1
[07/02/2021-15:05:16] [V] [TRT] Registered plugin creator - ::Proposal version 1
[07/02/2021-15:05:16] [V] [TRT] Registered plugin creator - ::ProposalLayer_TRT version 1
[07/02/2021-15:05:16] [V] [TRT] Registered plugin creator - ::PyramidROIAlign_TRT version 1
[07/02/2021-15:05:16] [V] [TRT] Registered plugin creator - ::ResizeNearest_TRT version 1
[07/02/2021-15:05:16] [V] [TRT] Registered plugin creator - ::Split version 1
[07/02/2021-15:05:16] [V] [TRT] Registered plugin creator - ::SpecialSlice_TRT version 1
[07/02/2021-15:05:16] [V] [TRT] Registered plugin creator - ::InstanceNormalization_TRT version 1
[07/02/2021-15:05:17] [I] [TRT] [MemUsageChange] Init CUDA: CPU +328, GPU +0, now: CPU 355, GPU 250 (MiB)
[07/02/2021-15:05:17] [I] [TRT] Loaded engine size: 19 MB
[07/02/2021-15:05:17] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine begin: CPU 355 MiB, GPU 250 MiB
[07/02/2021-15:05:18] [V] [TRT] Using cublasLt a tactic source
[07/02/2021-15:05:18] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +482, GPU +206, now: CPU 838, GPU 476 (MiB)
[07/02/2021-15:05:18] [V] [TRT] Using cuDNN as a tactic source
[07/02/2021-15:05:18] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +394, GPU +172, now: CPU 1232, GPU 648 (MiB)
[07/02/2021-15:05:18] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1232, GPU 630 (MiB)
[07/02/2021-15:05:18] [V] [TRT] Deserialization required 1204936 microseconds.
[07/02/2021-15:05:18] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine end: CPU 1232 MiB, GPU 630 MiB
[07/02/2021-15:05:18] [I] Engine loaded in 1.74508 sec.
[07/02/2021-15:05:18] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation begin: CPU 1212 MiB, GPU 630 MiB
[07/02/2021-15:05:18] [V] [TRT] Using cublasLt a tactic source
[07/02/2021-15:05:18] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +1, GPU +10, now: CPU 1213, GPU 640 (MiB)
[07/02/2021-15:05:18] [V] [TRT] Using cuDNN as a tactic source
[07/02/2021-15:05:18] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 1213, GPU 648 (MiB)
[07/02/2021-15:05:18] [V] [TRT] Total per-runner device memory is 16729600
[07/02/2021-15:05:18] [V] [TRT] Total per-runner host memory is 101424
[07/02/2021-15:05:18] [V] [TRT] Allocated activation device memory of size 445687808
[07/02/2021-15:05:18] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation end: CPU 1219 MiB, GPU 1090 MiB
[07/02/2021-15:05:18] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation begin: CPU 1219 MiB, GPU 1090 MiB
[07/02/2021-15:05:18] [V] [TRT] Using cublasLt a tactic source
[07/02/2021-15:05:18] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 1219, GPU 1098 (MiB)
[07/02/2021-15:05:18] [V] [TRT] Using cuDNN as a tactic source
[07/02/2021-15:05:18] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 1219, GPU 1108 (MiB)
[07/02/2021-15:05:18] [V] [TRT] Total per-runner device memory is 16729600
[07/02/2021-15:05:18] [V] [TRT] Total per-runner host memory is 101424
[07/02/2021-15:05:18] [V] [TRT] Allocated activation device memory of size 445687808
[07/02/2021-15:05:18] [I] [TRT] Could not set default profile 0 for execution context. Profile index must be set explicitly.
[07/02/2021-15:05:18] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation end: CPU 1219 MiB, GPU 1550 MiB
[07/02/2021-15:05:18] [E] Error[3]: [executionContext.cpp::setBindingDimensions::949] Error Code 3: Internal Error (Parameter check failed at: runtime/api/executionContext.cpp::setBindingDimensions::949, condition: mOptimizationProfile >= 0 && mOptimizationProfile < mEngine.getNbOptimizationProfiles()
)
[07/02/2021-15:05:18] [E] Inference set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8001] # trtexec --loadEngine=model-fp32.engine --shapes=input_tensor:0:1x300x300x3 --streams=2 --verbose
[07/02/2021-15:05:18] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1219, GPU 1518 (MiB)
[07/02/2021-15:05:18] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1219, GPU 1058 (MiB)
```
Thank you.
NVES
March 30, 2022, 5:38pm
3
Hi,
This looks like a Jetson issue. Please refer to the below samples in case useful.
For any further assistance, we will move this post to to Jetson related forum.
Thanks!