ERROR: [TRT]: 10: Could not find any implementation for node /0/model.24/Expand

pkot · February 9, 2024, 9:33am

Description

I can’t generate engine using TensorRT 8.6.2 in docker nvcr.io/nvidia/deepstream:6.4-triton-multiarch.

Issue may be similar to: Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[668...Mul_497] or Error Code 10: Internal Error (Could not find any implementation for node PWN(/model.0/act/Sigmoid).)
This bug exists only on Jetson platform, exactly the same engine can be built without any problem on x86 machine.

The same engene build tell on al older image nvcr.io/nvidia/deepstream-l4t:6.2-samples

Environment

Device: NVIDIA Jetson AGX Orin Developer kit
Host system: Jetpack 6.0 DP [L4T 36.2.0]
Baremetal or Container (if container which image + tag): Containernvcr.io/nvidia/deepstream:6.4-triton-multiarch
TensorRT Version: 8.6.2

Steps To Reproduce

Open docker docker run --gpus=all -it --rm -v ./:/workspace nvcr.io/nvidia/deepstream:6.4-triton-multiarch bash
Install dependencies like onnx
Build onnx file from model
Run command trtexec --onnx=best.onnx --verbose

Results

Config Info from TensorRT

$ trtexec --onnx=best.onnx --verbose
&&&& RUNNING TensorRT.trtexec [TensorRT v8602] # trtexec --onnx=best.onnx --verbose
[02/09/2024-10:22:42] [I] === Model Options ===
[02/09/2024-10:22:42] [I] Format: ONNX
[02/09/2024-10:22:42] [I] Model: best.onnx
[02/09/2024-10:22:42] [I] Output:
[02/09/2024-10:22:42] [I] === Build Options ===
[02/09/2024-10:22:42] [I] Max batch: explicit batch
[02/09/2024-10:22:42] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[02/09/2024-10:22:42] [I] minTiming: 1
[02/09/2024-10:22:42] [I] avgTiming: 8
[02/09/2024-10:22:42] [I] Precision: FP32
[02/09/2024-10:22:42] [I] LayerPrecisions: 
[02/09/2024-10:22:42] [I] Layer Device Types: 
[02/09/2024-10:22:42] [I] Calibration: 
[02/09/2024-10:22:42] [I] Refit: Disabled
[02/09/2024-10:22:42] [I] Version Compatible: Disabled
[02/09/2024-10:22:42] [I] ONNX Native InstanceNorm: Disabled
[02/09/2024-10:22:42] [I] TensorRT runtime: full
[02/09/2024-10:22:42] [I] Lean DLL Path: 
[02/09/2024-10:22:42] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[02/09/2024-10:22:42] [I] Exclude Lean Runtime: Disabled
[02/09/2024-10:22:42] [I] Sparsity: Disabled
[02/09/2024-10:22:42] [I] Safe mode: Disabled
[02/09/2024-10:22:42] [I] Build DLA standalone loadable: Disabled
[02/09/2024-10:22:42] [I] Allow GPU fallback for DLA: Disabled
[02/09/2024-10:22:42] [I] DirectIO mode: Disabled
[02/09/2024-10:22:42] [I] Restricted mode: Disabled
[02/09/2024-10:22:42] [I] Skip inference: Disabled
[02/09/2024-10:22:42] [I] Save engine: 
[02/09/2024-10:22:42] [I] Load engine: 
[02/09/2024-10:22:42] [I] Profiling verbosity: 0
[02/09/2024-10:22:42] [I] Tactic sources: Using default tactic sources
[02/09/2024-10:22:42] [I] timingCacheMode: local
[02/09/2024-10:22:42] [I] timingCacheFile: 
[02/09/2024-10:22:42] [I] Heuristic: Disabled
[02/09/2024-10:22:42] [I] Preview Features: Use default preview flags.
[02/09/2024-10:22:42] [I] MaxAuxStreams: -1
[02/09/2024-10:22:42] [I] BuilderOptimizationLevel: -1
[02/09/2024-10:22:42] [I] Input(s)s format: fp32:CHW
[02/09/2024-10:22:42] [I] Output(s)s format: fp32:CHW
[02/09/2024-10:22:42] [I] Input build shapes: model
[02/09/2024-10:22:42] [I] Input calibration shapes: model
[02/09/2024-10:22:42] [I] === System Options ===
[02/09/2024-10:22:42] [I] Device: 0
[02/09/2024-10:22:42] [I] DLACore: 
[02/09/2024-10:22:42] [I] Plugins:
[02/09/2024-10:22:42] [I] setPluginsToSerialize:
[02/09/2024-10:22:42] [I] dynamicPlugins:
[02/09/2024-10:22:42] [I] ignoreParsedPluginLibs: 0
[02/09/2024-10:22:42] [I] 
[02/09/2024-10:22:42] [I] === Inference Options ===
[02/09/2024-10:22:42] [I] Batch: Explicit
[02/09/2024-10:22:42] [I] Input inference shapes: model
[02/09/2024-10:22:42] [I] Iterations: 10
[02/09/2024-10:22:42] [I] Duration: 3s (+ 200ms warm up)
[02/09/2024-10:22:42] [I] Sleep time: 0ms
[02/09/2024-10:22:42] [I] Idle time: 0ms
[02/09/2024-10:22:42] [I] Inference Streams: 1
[02/09/2024-10:22:42] [I] ExposeDMA: Disabled
[02/09/2024-10:22:42] [I] Data transfers: Enabled
[02/09/2024-10:22:42] [I] Spin-wait: Disabled
[02/09/2024-10:22:42] [I] Multithreading: Disabled
[02/09/2024-10:22:42] [I] CUDA Graph: Disabled
[02/09/2024-10:22:42] [I] Separate profiling: Disabled
[02/09/2024-10:22:42] [I] Time Deserialize: Disabled
[02/09/2024-10:22:42] [I] Time Refit: Disabled
[02/09/2024-10:22:42] [I] NVTX verbosity: 0
[02/09/2024-10:22:42] [I] Persistent Cache Ratio: 0
[02/09/2024-10:22:42] [I] Inputs:
[02/09/2024-10:22:42] [I] === Reporting Options ===
[02/09/2024-10:22:42] [I] Verbose: Enabled
[02/09/2024-10:22:42] [I] Averages: 10 inferences
[02/09/2024-10:22:42] [I] Percentiles: 90,95,99
[02/09/2024-10:22:42] [I] Dump refittable layers:Disabled
[02/09/2024-10:22:42] [I] Dump output: Disabled
[02/09/2024-10:22:42] [I] Profile: Disabled
[02/09/2024-10:22:42] [I] Export timing to JSON file: 
[02/09/2024-10:22:42] [I] Export output to JSON file: 
[02/09/2024-10:22:42] [I] Export profile to JSON file: 
[02/09/2024-10:22:42] [I] 
[02/09/2024-10:22:42] [I] === Device Information ===
[02/09/2024-10:22:42] [I] Selected Device: Orin
[02/09/2024-10:22:42] [I] Compute Capability: 8.7
[02/09/2024-10:22:42] [I] SMs: 16
[02/09/2024-10:22:42] [I] Device Global Memory: 30697 MiB
[02/09/2024-10:22:42] [I] Shared Memory per SM: 164 KiB
[02/09/2024-10:22:42] [I] Memory Bus Width: 256 bits (ECC disabled)
[02/09/2024-10:22:42] [I] Application Compute Clock Rate: 1.3 GHz
[02/09/2024-10:22:42] [I] Application Memory Clock Rate: 0.816 GHz
[02/09/2024-10:22:42] [I] 
[02/09/2024-10:22:42] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[02/09/2024-10:22:42] [I] 
[02/09/2024-10:22:42] [I] TensorRT version: 8.6.2

Error I get:

[02/09/2024-10:39:13] [V] [TRT] =============== Computing costs for /0/model.26/m/m.1/cv2/conv/Conv
[02/09/2024-10:39:13] [V] [TRT] *************** Autotuning format combination: Float(512000,1600,40,1) -> Float(512000,1600,40,1) ***************
[02/09/2024-10:39:13] [V] [TRT] *************** Autotuning format combination: Float(512000,1,12800,320) -> Float(512000,1,12800,320) ***************
[02/09/2024-10:39:13] [V] [TRT] *************** Autotuning format combination: Float(128000,1:4,3200,80) -> Float(512000,1600,40,1) ***************
[02/09/2024-10:39:13] [V] [TRT] *************** Autotuning format combination: Float(128000,1:4,3200,80) -> Float(128000,1:4,3200,80) ***************
[02/09/2024-10:39:13] [V] [TRT] =============== Computing costs for /0/model.33/Expand
[02/09/2024-10:39:13] [V] [TRT] *************** Autotuning format combination: Float(1,1) -> Float(80,1) ***************
[02/09/2024-10:39:13] [V] [TRT] --------------- Timing Runner: /0/model.33/Expand (Padding[0x8000000c])
[02/09/2024-10:39:13] [V] [TRT] Padding has no valid tactics for this config, skipping
[02/09/2024-10:39:13] [V] [TRT] --------------- Timing Runner: /0/model.33/Expand (Slice[0x8000001b])
[02/09/2024-10:39:13] [V] [TRT] Skipping tactic 0x0000000000000000 due to exception cudaEventElapsedTime
[02/09/2024-10:39:13] [V] [TRT] /0/model.33/Expand (Slice[0x8000001b]) profiling completed in 0.0023289 seconds. Fastest Tactic: 0xd15ea5edd15ea5ed Time: inf
[02/09/2024-10:39:13] [V] [TRT] Deleting timing cache: 355 entries, served 1129 hits since creation.
[02/09/2024-10:39:13] [E] Error[10]: Could not find any implementation for node /0/model.33/Expand.
[02/09/2024-10:39:13] [E] Error[10]: [optimizer.cpp::computeCosts::3869] Error Code 10: Internal Error (Could not find any implementation for node /0/model.33/Expand.)
[02/09/2024-10:39:13] [E] Engine could not be created from network
[02/09/2024-10:39:13] [E] Building engine failed
[02/09/2024-10:39:13] [E] Failed to create engine from model or file.
[02/09/2024-10:39:13] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8602] # trtexec --onnx=best.onnx --verbose

AakankshaS · February 15, 2024, 4:29pm

Hi @pkot ,
This might be a deepstream issue, would you mind checking there.

Thanks

pkot · February 16, 2024, 8:23am

@AakankshaS

What do you mean by checking there? Should I open another topic in deepstream subforum? DeepStream SDK - NVIDIA Developer Forums

junshengy · March 5, 2024, 1:05pm

The parameters for starting docker seem to have issues, try the following command line

docker run -it --rm --net=host --runtime nvidia  -e DISPLAY=$DISPLAY -w /opt/nvidia/deepstream/deepstream-6.4 -v /tmp/.X11-unix/:/tmp/.X11-unix nvcr.io/nvidia/deepstream:6.4-triton-multiarch

pkot · March 5, 2024, 1:51pm

junshengy:

The parameters for starting docker seem to have issues, try the following command line
docker run -it --rm --net=host --runtime nvidia  -e DISPLAY=$DISPLAY -w /opt/nvidia/deepstream/deepstream-6.4 -v /tmp/.X11-unix/:/tmp/.X11-unix nvcr.io/nvidia/deepstream:6.4-triton-multiarch

It did not work

I launched docker:

export DISPLAY=:1
xhost +local:
docker run -it --rm --net=host --runtime nvidia  -e DISPLAY=$DISPLAY -w /opt/nvidia/deepstream/deepstream-6.4 -v /tmp/.X11-unix/:/tmp/.X11-unix nvcr.io/nvidia/deepstream:6.4-triton-multiarch

And went through commands:

apt install -y kmod
git clone https://github.com/marcoslucianops/DeepStream-Yolo.git
cd DeepStream-Yolo/
git clone https://github.com/ultralytics/yolov5.git
cd yolov5
pip3 install cmake
pip3 install -r requirements.txt
pip3 install onnx onnxsim onnxruntime
cp ./../utils/export_yoloV5.py ./
wget https://github.com/ultralytics/yolov5/releases/download/v7.0/yolov5s.pt
python3 export_yoloV5.py -w yolov5s.pt --dynamic
# Build
cd ..
CUDA_VER=12.2 make -C nvdsinfer_custom_impl_Yolo
cp ./yolov5/yolov5s.onnx ./
cp ./yolov5/labels.txt  ./
sed -i.bak 's/config_infer_primary.txt/config_infer_primary_yoloV5.txt/g' ./deepstream_app_config.txt
deepstream-app -c deepstream_app_config.txt
deepstream-app -c deepstream_app_config.txt

And I still got error:

ERROR: [TRT]: 10: Could not find any implementation for node /0/model.24/Range.
ERROR: [TRT]: 10: [optimizer.cpp::computeCosts::3869] Error Code 10: Internal Error (Could not find any implementation for node /0/model.24/Range.)
Building engine failed

pkot · March 22, 2024, 9:19am

github.com/NVIDIA/TensorRT

Inconsistent failures during conversion of QAT onnx model to TRT

opened 11:10PM - 08 Dec 22 UTC

closed 01:51AM - 22 Mar 24 UTC

c-schumacher

bug triaged internal-bug-tracked

## Description I used the pytorch_quantization toolkit to convert the Conv2d …layers in a fully convolutional network (source model [here](https://github.com/clovaai/CRAFT-pytorch)) to int8 and was able to successfully export the model to ONNX. When I attempt to convert the model to a trt engine while leaving all quantized Conv2d layers enabled, I get the following error at layer basenet.slice1.10: ``` [12/08/2022-21:54:08] [TRT] [V] --------------- Timing Runner: basenet.slice1.10.weight + QuantizeLinear_44 + Conv_46 (CaskConvolution) [12/08/2022-21:54:08] [TRT] [V] CaskConvolution has no valid tactics for this config, skipping [12/08/2022-21:54:08] [TRT] [E] 10: [optimizer.cpp::computeCosts::3626] Error Code 10: Internal Error (Could not find any implementation for node basenet.slice1.10.weight + QuantizeLinear_44 + Conv_46.) [12/08/2022-21:54:08] [TRT] [E] 2: [builder.cpp::buildSerializedNetwork::636] Error Code 2: Internal Error (Assertion engine != nullptr failed. ) ``` A longer snippet of the logs shows that it was able to successfully convert earlier quantized Conv2d layers (basenet.slice.1.7 shown here) before failing: ``` [12/08/2022-21:54:08] [TRT] [V] --------------- Timing Runner: basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 (CaskConvolution) [12/08/2022-21:54:08] [TRT] [V] CaskConvolution has no valid tactics for this config, skipping [12/08/2022-21:54:08] [TRT] [V] *************** Autotuning format combination: Int8(2359296,147456:4,288,1) -> Int8(4718592,147456:4,288,1) *************** [12/08/2022-21:54:08] [TRT] [V] --------------- Timing Runner: basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 (CudaDepthwiseConvolution) [12/08/2022-21:54:08] [TRT] [V] CudaDepthwiseConvolution has no valid tactics for this config, skipping [12/08/2022-21:54:08] [TRT] [V] --------------- Timing Runner: basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 (FusedConvActConvolution) [12/08/2022-21:54:08] [TRT] [V] FusedConvActConvolution has no valid tactics for this config, skipping [12/08/2022-21:54:08] [TRT] [V] --------------- Timing Runner: basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 (CaskConvolution) [12/08/2022-21:54:08] [TRT] [V] CaskConvolution has no valid tactics for this config, skipping [12/08/2022-21:54:08] [TRT] [V] *************** Autotuning format combination: Int8(2359296,147456:4,288,1) -> Int8(589824,147456:32,288,1) *************** [12/08/2022-21:54:08] [TRT] [V] --------------- Timing Runner: basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 (CaskConvolution) [12/08/2022-21:54:08] [TRT] [V] CaskConvolution has no valid tactics for this config, skipping [12/08/2022-21:54:08] [TRT] [V] *************** Autotuning format combination: Int8(294912,147456:32,288,1) -> Int8(589824,147456:32,288,1) *************** [12/08/2022-21:54:08] [TRT] [V] --------------- Timing Runner: basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 (CudaGroupConvolution) [12/08/2022-21:54:08] [TRT] [V] CudaGroupConvolution has no valid tactics for this config, skipping [12/08/2022-21:54:08] [TRT] [V] --------------- Timing Runner: basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 (CudaDepthwiseConvolution) [12/08/2022-21:54:08] [TRT] [V] CudaDepthwiseConvolution has no valid tactics for this config, skipping [12/08/2022-21:54:08] [TRT] [V] --------------- Timing Runner: basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 (FusedConvActConvolution) [12/08/2022-21:54:08] [TRT] [V] FusedConvActConvolution has no valid tactics for this config, skipping [12/08/2022-21:54:08] [TRT] [V] --------------- Timing Runner: basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 (CaskConvolution) [12/08/2022-21:54:08] [TRT] [V] basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 Set Tactic Name: sm75_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize256x128x64_stage1_warpsize4x2x1_g1_tensor8x8x16_t1r3s3 Tactic: 0x0405e3a763219823 [12/08/2022-21:54:08] [TRT] [V] Tactic: 0x0405e3a763219823 Time: 0.311957 [12/08/2022-21:54:08] [TRT] [V] basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 Set Tactic Name: sm75_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize64x128x64_stage1_warpsize2x2x1_g1_tensor8x8x16_t1r3s3 Tactic: 0x09727a53770225e8 [12/08/2022-21:54:08] [TRT] [V] Tactic: 0x09727a53770225e8 Time: 0.335883 [12/08/2022-21:54:08] [TRT] [V] basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 Set Tactic Name: sm75_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize64x32x64_stage1_warpsize2x1x1_g1_tensor8x8x16_t1r3s3 Tactic: 0x13463e9bf9ae0d73 [12/08/2022-21:54:08] [TRT] [V] Tactic: 0x13463e9bf9ae0d73 Time: 0.557707 [12/08/2022-21:54:08] [TRT] [V] basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 Set Tactic Name: sm75_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage1_warpsize2x2x1_g1_tensor8x8x16 Tactic: 0x1d9b1bf0b28cc357 [12/08/2022-21:54:08] [TRT] [V] Tactic: 0x1d9b1bf0b28cc357 Time: 0.279893 [12/08/2022-21:54:08] [TRT] [V] basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 Set Tactic Name: sm75_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize32x64x64_stage1_warpsize2x2x1_g1_tensor8x8x16 Tactic: 0x23cd610b930e6789 [12/08/2022-21:54:08] [TRT] [V] Tactic: 0x23cd610b930e6789 Time: 0.654432 [12/08/2022-21:54:08] [TRT] [V] basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 Set Tactic Name: sm75_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage1_warpsize2x2x1_g1_tensor8x8x16_t1r3s3 Tactic: 0x3a7df5a005634aca [12/08/2022-21:54:08] [TRT] [V] Tactic: 0x3a7df5a005634aca Time: 0.276848 [12/08/2022-21:54:08] [TRT] [V] basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 Set Tactic Name: sm75_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize32x64x64_stage1_warpsize2x2x1_g1_tensor8x8x16_t1r3s3 Tactic: 0x3cda2ee55a7d0cc2 [12/08/2022-21:54:08] [TRT] [V] Tactic: 0x3cda2ee55a7d0cc2 Time: 0.655232 [12/08/2022-21:54:08] [TRT] [V] basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 Set Tactic Name: sm75_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x256x64_stage1_warpsize2x4x1_g1_tensor8x8x16_t1r3s3 Tactic: 0x446f06d5a2e0bae3 [12/08/2022-21:54:08] [TRT] [V] Tactic: 0x446f06d5a2e0bae3 Time: 0.561131 [12/08/2022-21:54:08] [TRT] [V] basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 Set Tactic Name: sm75_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize32x32x64_stage1_warpsize2x1x1_g1_tensor8x8x16_t1r3s3 Tactic: 0x4e4c4bf050b40a1b [12/08/2022-21:54:08] [TRT] [V] Tactic: 0x4e4c4bf050b40a1b Time: 0.836875 [12/08/2022-21:54:08] [TRT] [V] basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 Set Tactic Name: sm75_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize32x32x64_stage1_warpsize2x1x1_g1_tensor8x8x16 Tactic: 0x58be15b6f024df52 [12/08/2022-21:54:08] [TRT] [V] Tactic: 0x58be15b6f024df52 Time: 0.847189 [12/08/2022-21:54:08] [TRT] [V] basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 Set Tactic Name: sm75_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize64x64x64_stage1_warpsize2x2x1_g1_tensor8x8x16_t1r3s3 Tactic: 0x61d05b8ef3670baa [12/08/2022-21:54:08] [TRT] [V] Tactic: 0x61d05b8ef3670baa Time: 0.450656 [12/08/2022-21:54:08] [TRT] [V] basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 Set Tactic Name: sm75_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize64x64x64_stage1_warpsize2x2x1_g1_tensor8x8x16 Tactic: 0x81994a658cdf908d [12/08/2022-21:54:08] [TRT] [V] Tactic: 0x81994a658cdf908d Time: 0.452651 [12/08/2022-21:54:08] [TRT] [V] basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 Set Tactic Name: sm75_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize256x128x64_stage1_warpsize4x2x1_g1_tensor8x8x16 Tactic: 0x85047b8e34ed27fa [12/08/2022-21:54:08] [TRT] [V] Tactic: 0x85047b8e34ed27fa Time: 0.343371 [12/08/2022-21:54:08] [TRT] [V] basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 Set Tactic Name: sm75_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize256x64x64_stage1_warpsize4x1x1_g1_tensor8x8x16_t1r3s3 Tactic: 0x8a60cb2150513f2e [12/08/2022-21:54:08] [TRT] [V] Tactic: 0x8a60cb2150513f2e Time: 0.340192 [12/08/2022-21:54:08] [TRT] [V] basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 Set Tactic Name: sm75_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x64x64_stage1_warpsize2x2x1_g1_tensor8x8x16_t1r3s3 Tactic: 0xa792e2a2dcc5e78f [12/08/2022-21:54:08] [TRT] [V] Tactic: 0xa792e2a2dcc5e78f Time: 0.367264 [12/08/2022-21:54:08] [TRT] [V] basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 Set Tactic Name: sm75_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize256x64x64_stage1_warpsize4x1x1_g1_tensor8x8x16 Tactic: 0xb81aeaba4cbc0d97 [12/08/2022-21:54:08] [TRT] [V] Tactic: 0xb81aeaba4cbc0d97 Time: 0.338603 [12/08/2022-21:54:08] [TRT] [V] basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 Set Tactic Name: sm75_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize64x128x64_stage1_warpsize2x2x1_g1_tensor8x8x16 Tactic: 0xdd517393a24bd0f4 [12/08/2022-21:54:08] [TRT] [V] Tactic: 0xdd517393a24bd0f4 Time: 0.381312 [12/08/2022-21:54:08] [TRT] [V] basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 Set Tactic Name: sm75_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x64x64_stage1_warpsize2x2x1_g1_tensor8x8x16 Tactic: 0xdfb027065697c23b [12/08/2022-21:54:08] [TRT] [V] Tactic: 0xdfb027065697c23b Time: 0.392992 [12/08/2022-21:54:08] [TRT] [V] basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 Set Tactic Name: sm75_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize64x32x64_stage1_warpsize2x1x1_g1_tensor8x8x16 Tactic: 0xfaea3ed8eff52856 [12/08/2022-21:54:08] [TRT] [V] Tactic: 0xfaea3ed8eff52856 Time: 0.636875 [12/08/2022-21:54:08] [TRT] [V] basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 Set Tactic Name: sm75_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x256x64_stage1_warpsize2x4x1_g1_tensor8x8x16 Tactic: 0xfb1f0c938b867bc9 [12/08/2022-21:54:08] [TRT] [V] Tactic: 0xfb1f0c938b867bc9 Time: 0.666251 [12/08/2022-21:54:08] [TRT] [V] Fastest Tactic: 0x3a7df5a005634aca Time: 0.276848 [12/08/2022-21:54:08] [TRT] [V] >>>>>>>>>>>>>>> Chose Runner Type: CaskConvolution Tactic: 0x3a7df5a005634aca [12/08/2022-21:54:08] [TRT] [V] =============== Computing costs for [12/08/2022-21:54:08] [TRT] [V] *************** Autotuning format combination: Int8(4718592,147456:4,288,1) -> Float(18874368,147456,288,1) *************** [12/08/2022-21:54:08] [TRT] [V] --------------- Timing Runner: basenet.slice1.10.weight + QuantizeLinear_44 + Conv_46 (CudaDepthwiseConvolution) [12/08/2022-21:54:08] [TRT] [V] CudaDepthwiseConvolution has no valid tactics for this config, skipping [12/08/2022-21:54:08] [TRT] [V] --------------- Timing Runner: basenet.slice1.10.weight + QuantizeLinear_44 + Conv_46 (CaskConvolution) [12/08/2022-21:54:08] [TRT] [V] CaskConvolution has no valid tactics for this config, skipping [12/08/2022-21:54:08] [TRT] [V] *************** Autotuning format combination: Int8(589824,147456:32,288,1) -> Float(18874368,147456,288,1) *************** [12/08/2022-21:54:08] [TRT] [V] --------------- Timing Runner: basenet.slice1.10.weight + QuantizeLinear_44 + Conv_46 (CaskConvolution) [12/08/2022-21:54:08] [TRT] [V] CaskConvolution has no valid tactics for this config, skipping [12/08/2022-21:54:08] [TRT] [V] *************** Autotuning format combination: Int8(589824,147456:32,288,1) -> Float(589824,147456:32,288,1) *************** [12/08/2022-21:54:08] [TRT] [V] --------------- Timing Runner: basenet.slice1.10.weight + QuantizeLinear_44 + Conv_46 (CaskConvolution) [12/08/2022-21:54:08] [TRT] [V] CaskConvolution has no valid tactics for this config, skipping [12/08/2022-21:54:08] [TRT] [V] *************** Autotuning format combination: Int8(589824,147456:32,288,1) -> Half(589824,147456:32,288,1) *************** [12/08/2022-21:54:08] [TRT] [V] --------------- Timing Runner: basenet.slice1.10.weight + QuantizeLinear_44 + Conv_46 (CaskConvolution) [12/08/2022-21:54:08] [TRT] [V] CaskConvolution has no valid tactics for this config, skipping [12/08/2022-21:54:08] [TRT] [E] 10: [optimizer.cpp::computeCosts::3626] Error Code 10: Internal Error (Could not find any implementation for node basenet.slice1.10.weight + QuantizeLinear_44 + Conv_46.) [12/08/2022-21:54:08] [TRT] [E] 2: [builder.cpp::buildSerializedNetwork::636] Error Code 2: Internal Error (Assertion engine != nullptr failed. ) ``` When I disable the quantized Conv2d layer "basenet.slice1.10", however, the conversion still fails and instead can't find an implementation for the node "basenet.slice1.7" now. It also seems to explore less tacics: ``` [12/08/2022-21:51:58] [TRT] [V] --------------- Timing Runner: basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 (CudaDepthwiseConvolution) [12/08/2022-21:51:58] [TRT] [V] CudaDepthwiseConvolution has no valid tactics for this config, skipping [12/08/2022-21:51:58] [TRT] [V] --------------- Timing Runner: basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 (CaskConvolution) [12/08/2022-21:51:58] [TRT] [V] CaskConvolution has no valid tactics for this config, skipping [12/08/2022-21:51:58] [TRT] [V] *************** Autotuning format combination: Int8(294912,147456:32,288,1) -> Float(18874368,147456,288,1) *************** [12/08/2022-21:51:58] [TRT] [V] --------------- Timing Runner: basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 (CaskConvolution) [12/08/2022-21:51:58] [TRT] [V] CaskConvolution has no valid tactics for this config, skipping [12/08/2022-21:51:58] [TRT] [V] *************** Autotuning format combination: Int8(294912,147456:32,288,1) -> Float(589824,147456:32,288,1) *************** [12/08/2022-21:51:58] [TRT] [V] --------------- Timing Runner: basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 (CaskConvolution) [12/08/2022-21:51:58] [TRT] [V] CaskConvolution has no valid tactics for this config, skipping [12/08/2022-21:51:58] [TRT] [V] *************** Autotuning format combination: Int8(294912,147456:32,288,1) -> Half(589824,147456:32,288,1) *************** [12/08/2022-21:51:58] [TRT] [V] --------------- Timing Runner: basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 (CaskConvolution) [12/08/2022-21:51:58] [TRT] [V] CaskConvolution has no valid tactics for this config, skipping [12/08/2022-21:51:59] [TRT] [E] 10: [optimizer.cpp::computeCosts::3626] Error Code 10: Internal Error (Could not find any implementation for node basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34.) [12/08/2022-21:51:59] [TRT] [E] 2: [builder.cpp::buildSerializedNetwork::636] Error Code 2: Internal Error (Assertion engine != nullptr failed. ) ``` What I've tried to do: - I validated that the model can be successfully converted from pytorch->ONNX->TRT without quantized layers - I attempted to replicate the conversion process on an ampere A10 GPU, increased workspace size, and reduced height and width dimensions of inputs per suggestions in https://github.com/NVIDIA/TensorRT/issues/1768 - Tried to convert within the docker container nvcr.io/nvidia/tensorrt:22.07-py3 per the suggestion in https://github.com/NVIDIA/TensorRT/issues/2240 - Tried to build the engine without "OBEY_PRECISION_CONSTRAINTS" or "PREFER_PRECISION_CONSTRAINTS" builder flags I see the same errors with despite these modifications. I can also successfully perform inference with my quantized model, though there is some performance difference compared to the quantized pytorch model, not sure if that could be related. Is there something obvious I'm missing in the logs, or a known way to remedy this issue? ## Environment **TensorRT Version**: 8.4.2.1 **NVIDIA GPU**: T4 **NVIDIA Driver Version**: 510.47.03 **CUDA Version**: 11.5 **CUDNN Version**: 8.4 **Operating System**: ubuntu 20.04 **Python Version (if applicable)**: 3.9 **Tensorflow Version (if applicable)**: **PyTorch Version (if applicable)**: 1.11.0 **Baremetal or Container (if so, version)**: ## Relevant Files A zip file with the quantized onnx model I'm attempting to convert can be downloaded here: https://drive.google.com/file/d/1PfO7JWONrX4JHMxCCjxmr6-f4tNbUW5g/view

This is a cuda driver bug in Jetson and we fixed it in our latest internal code, should be come with the next release(probably GA ). closed this.

Issue will be solved with the next release.

system · April 5, 2024, 9:19am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Error Code 10: Internal Error (Could not find any implementation for node TensorRT cudnn	19	2502	September 29, 2024
Using Custom action recognition Model in Deepstream 3D action recognition and Getting Error TAO Toolkit	70	923	December 12, 2023
Cannot load built engine resnet50_market1501_aicity156 DeepStream SDK nvbugs	53	1706	February 14, 2025
Deepstream infrence gives no detection TAO Toolkit	28	1934	December 9, 2021
Reshaping error when set batch-size greater than 1 in onnx modle DeepStream SDK	23	1240	February 10, 2023
Engine failed to match config params, trying rebuild DeepStream SDK	10	161	July 9, 2024
Migrated from DeepStream 4 to Deepstream 5 and got errors DeepStream SDK nvbugs	36	2388	October 12, 2021
ERORR with ONNX2TRT : Unknown embedded device detected Jetson Xavier NX onnx	18	4555	April 27, 2022
[ERROR] Model has dynamic shape but no optimization profile specified. Aborted (core dumped) TAO Toolkit	30	2025	December 13, 2021
How to generate a tensorrt model that is supported by Deesptream sdk DeepStream SDK	17	542	January 29, 2024

ERROR: [TRT]: 10: Could not find any implementation for node /0/model.24/Expand

Description

Environment

Steps To Reproduce

Results

Related topics