Error Code 1: Myelin (Division by 0 detected in the shape graph. Tensor (Divisor) "sp__mye3" is equal to 0.; )

• Hardware Platform (NVIDIA GeForce RTX 3060)
• DeepStream Version - 6.2
• Ubuntu 20.04

I tried running a custom YOLOV8 segmentation model in deepstream-python-apps and this is the error
(the code stops when there is no detection in the frame)
0:00:08.580778573 84048 0x425ac70 INFO nvinfer gstnvinfer.cpp:680:gst_nvinfer_logger: NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2012> [UID = 1]: Use deserialized engine model: /home/divya/Documents/DeepStream-Yolo-Seg/utils/pothole_new.onnx_b1_gpu0_fp32.engine
0:00:08.588918887 84048 0x425ac70 INFO nvinfer gstnvinfer_impl.cpp:328:notifyLoadModelStatus: [UID 1]: Load new model:/home/divya/Documents/DeepStream-Yolo-Seg/pothole_config.txt sucessfully
Decodebin child added: source

Decodebin child added: decodebin0

**PERF: {‘stream0’: 0.0}

Decodebin child added: qtdemux0

Decodebin child added: multiqueue0

Decodebin child added: h264parse0

Decodebin child added: capsfilter0

Decodebin child added: nvv4l2decoder0

In cb_newpad

gstname= video/x-raw
features= <Gst.CapsFeatures object at 0x7f69f7e28c40 (GstCapsFeatures at 0x7f688c002c20)>
Detected mask 0.05464618280529976 of obj Pothole
Detected mask 0.16089653968811035 of obj Pothole
Detected mask 0.03518711030483246 of obj Pothole
Detected mask 0.11133788526058197 of obj Pothole
Detected mask 0.025264285504817963 of obj Pothole
Detected mask 0.22651955485343933 of obj Pothole
ERROR: [TRT]: 1: [runner.cpp::shapeChangeHelper::621] Error Code 1: Myelin (Division by 0 detected in the shape graph. Tensor (Divisor) “sp__mye3” is equal to 0.; )
ERROR: nvdsinfer_backend.cpp:506 Failed to enqueue trt inference batch
ERROR: nvdsinfer_context_impl.cpp:1650 Infer context enqueue buffer failed, nvinfer error:NVDSINFER_TENSORRT_ERROR
0:00:09.715338644 84048 0x3322980 WARN nvinfer gstnvinfer.cpp:1388:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
Error: gst-stream-error-quark: Failed to queue input batch for inferencing (1): gstnvinfer.cpp(1388): gst_nvinfer_input_queue_loop (): /GstPipeline:pipeline0/GstNvInfer:primary-inference
Exiting app

Detected mask None of obj Pothole
[NvMultiObjectTracker] De-initialized
ERROR: [TRT]: 1: [runner.cpp::shapeChangeHelper::621] Error Code 1: Myelin (Division by 0 detected in the shape graph. Tensor (Divisor) “sp__mye3” is equal to 0.; )
ERROR: nvdsinfer_backend.cpp:506 Failed to enqueue trt inference batch
ERROR: nvdsinfer_context_impl.cpp:1650 Infer context enqueue buffer failed, nvinfer error:NVDSINFER_TENSORRT_ERROR
0:00:09.736765478 84048 0x3322980 WARN nvinfer gstnvinfer.cpp:1388:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing

How do I resolve this?

  1. what is TensorRT version? you can use “dpkg -l |grep TensorRT” to check? are you testing in 6.2 docker?
  2. could you share nvinfer 's configuration file?
  3. to isolate this issue, could you share the result of this command-line? thanks!
    /usr/src/tensorrt/bin/trtexec --fp16 --onnx=yolov8s-seg.onnx --saveEngine=ds62.engine --minShapes=input:1x3x640x640 --optShapes=input:1x3x640x640 --maxShapes=input:1x3x640x640 --shapes=input:1x3x640x640 --workspace=10000

Thanks for your reply.

  1. TensorRT version

  2. Sharing nvinfer config file
    pothole_config.txt (811 Bytes)

divya@divya-GF65-Thin-10UE:~$ /usr/src/tensorrt/bin/trtexec --fp16 --onnx=/home/divya/Documents/DeepStream-Yolo-Seg/utils/pothole_new.onnx --saveEngine=ds62.engine --minShapes=input:1x3x640x640 --optShapes=input:1x3x640x640 --maxShapes=input:1x3x640x640 --shapes=input:1x3x640x640 --workspace=10000
&&&& RUNNING TensorRT.trtexec [TensorRT v8601] # /usr/src/tensorrt/bin/trtexec --fp16 --onnx=/home/divya/Documents/DeepStream-Yolo-Seg/utils/pothole_new.onnx --saveEngine=ds62.engine --minShapes=input:1x3x640x640 --optShapes=input:1x3x640x640 --maxShapes=input:1x3x640x640 --shapes=input:1x3x640x640 --workspace=10000
[11/25/2023-15:50:04] [W] --workspace flag has been deprecated by --memPoolSize flag.
[11/25/2023-15:50:04] [I] === Model Options ===
[11/25/2023-15:50:04] [I] Format: ONNX
[11/25/2023-15:50:04] [I] Model: /home/divya/Documents/DeepStream-Yolo-Seg/utils/pothole_new.onnx
[11/25/2023-15:50:04] [I] Output:
[11/25/2023-15:50:04] [I] === Build Options ===
[11/25/2023-15:50:04] [I] Max batch: explicit batch
[11/25/2023-15:50:04] [I] Memory Pools: workspace: 10000 MiB, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[11/25/2023-15:50:04] [I] minTiming: 1
[11/25/2023-15:50:04] [I] avgTiming: 8
[11/25/2023-15:50:04] [I] Precision: FP32+FP16
[11/25/2023-15:50:04] [I] LayerPrecisions:
[11/25/2023-15:50:04] [I] Layer Device Types:
[11/25/2023-15:50:04] [I] Calibration:
[11/25/2023-15:50:04] [I] Refit: Disabled
[11/25/2023-15:50:04] [I] Version Compatible: Disabled
[11/25/2023-15:50:04] [I] TensorRT runtime: full
[11/25/2023-15:50:04] [I] Lean DLL Path:
[11/25/2023-15:50:04] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[11/25/2023-15:50:04] [I] Exclude Lean Runtime: Disabled
[11/25/2023-15:50:04] [I] Sparsity: Disabled
[11/25/2023-15:50:04] [I] Safe mode: Disabled
[11/25/2023-15:50:04] [I] Build DLA standalone loadable: Disabled
[11/25/2023-15:50:04] [I] Allow GPU fallback for DLA: Disabled
[11/25/2023-15:50:04] [I] DirectIO mode: Disabled
[11/25/2023-15:50:04] [I] Restricted mode: Disabled
[11/25/2023-15:50:04] [I] Skip inference: Disabled
[11/25/2023-15:50:04] [I] Save engine: ds62.engine
[11/25/2023-15:50:04] [I] Load engine:
[11/25/2023-15:50:04] [I] Profiling verbosity: 0
[11/25/2023-15:50:04] [I] Tactic sources: Using default tactic sources
[11/25/2023-15:50:04] [I] timingCacheMode: local
[11/25/2023-15:50:04] [I] timingCacheFile:
[11/25/2023-15:50:04] [I] Heuristic: Disabled
[11/25/2023-15:50:04] [I] Preview Features: Use default preview flags.
[11/25/2023-15:50:04] [I] MaxAuxStreams: -1
[11/25/2023-15:50:04] [I] BuilderOptimizationLevel: -1
[11/25/2023-15:50:04] [I] Input(s)s format: fp32:CHW
[11/25/2023-15:50:04] [I] Output(s)s format: fp32:CHW
[11/25/2023-15:50:04] [I] Input build shape: input=1x3x640x640+1x3x640x640+1x3x640x640
[11/25/2023-15:50:04] [I] Input calibration shapes: model
[11/25/2023-15:50:04] [I] === System Options ===
[11/25/2023-15:50:04] [I] Device: 0
[11/25/2023-15:50:04] [I] DLACore:
[11/25/2023-15:50:04] [I] Plugins:
[11/25/2023-15:50:04] [I] setPluginsToSerialize:
[11/25/2023-15:50:04] [I] dynamicPlugins:
[11/25/2023-15:50:04] [I] ignoreParsedPluginLibs: 0
[11/25/2023-15:50:04] [I]
[11/25/2023-15:50:04] [I] === Inference Options ===
[11/25/2023-15:50:04] [I] Batch: Explicit
[11/25/2023-15:50:04] [I] Input inference shape: input=1x3x640x640
[11/25/2023-15:50:04] [I] Iterations: 10
[11/25/2023-15:50:04] [I] Duration: 3s (+ 200ms warm up)
[11/25/2023-15:50:04] [I] Sleep time: 0ms
[11/25/2023-15:50:04] [I] Idle time: 0ms
[11/25/2023-15:50:04] [I] Inference Streams: 1
[11/25/2023-15:50:04] [I] ExposeDMA: Disabled
[11/25/2023-15:50:04] [I] Data transfers: Enabled
[11/25/2023-15:50:04] [I] Spin-wait: Disabled
[11/25/2023-15:50:04] [I] Multithreading: Disabled
[11/25/2023-15:50:04] [I] CUDA Graph: Disabled
[11/25/2023-15:50:04] [I] Separate profiling: Disabled
[11/25/2023-15:50:04] [I] Time Deserialize: Disabled
[11/25/2023-15:50:04] [I] Time Refit: Disabled
[11/25/2023-15:50:04] [I] NVTX verbosity: 0
[11/25/2023-15:50:04] [I] Persistent Cache Ratio: 0
[11/25/2023-15:50:04] [I] Inputs:
[11/25/2023-15:50:04] [I] === Reporting Options ===
[11/25/2023-15:50:04] [I] Verbose: Disabled
[11/25/2023-15:50:04] [I] Averages: 10 inferences
[11/25/2023-15:50:04] [I] Percentiles: 90,95,99
[11/25/2023-15:50:04] [I] Dump refittable layers:Disabled
[11/25/2023-15:50:04] [I] Dump output: Disabled
[11/25/2023-15:50:04] [I] Profile: Disabled
[11/25/2023-15:50:04] [I] Export timing to JSON file:
[11/25/2023-15:50:04] [I] Export output to JSON file:
[11/25/2023-15:50:04] [I] Export profile to JSON file:
[11/25/2023-15:50:04] [I]
[11/25/2023-15:50:04] [I] === Device Information ===
[11/25/2023-15:50:04] [I] Selected Device: NVIDIA GeForce RTX 3060 Laptop GPU
[11/25/2023-15:50:04] [I] Compute Capability: 8.6
[11/25/2023-15:50:04] [I] SMs: 30
[11/25/2023-15:50:04] [I] Device Global Memory: 5937 MiB
[11/25/2023-15:50:04] [I] Shared Memory per SM: 100 KiB
[11/25/2023-15:50:04] [I] Memory Bus Width: 192 bits (ECC disabled)
[11/25/2023-15:50:04] [I] Application Compute Clock Rate: 1.402 GHz
[11/25/2023-15:50:04] [I] Application Memory Clock Rate: 6.001 GHz
[11/25/2023-15:50:04] [I]
[11/25/2023-15:50:04] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[11/25/2023-15:50:04] [I]
[11/25/2023-15:50:04] [I] TensorRT version: 8.6.1
[11/25/2023-15:50:04] [I] Loading standard plugins
[11/25/2023-15:50:05] [I] [TRT] [MemUsageChange] Init CUDA: CPU +352, GPU +0, now: CPU 367, GPU 550 (MiB)
[11/25/2023-15:50:10] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +1218, GPU +266, now: CPU 1661, GPU 816 (MiB)
[11/25/2023-15:50:10] [W] [TRT] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See “Lazy Loading” section of CUDA documentation CUDA C++ Programming Guide
[11/25/2023-15:50:10] [I] Start parsing network model.
[11/25/2023-15:50:10] [I] [TRT] ----------------------------------------------------------------
[11/25/2023-15:50:10] [I] [TRT] Input filename: /home/divya/Documents/DeepStream-Yolo-Seg/utils/pothole_new.onnx
[11/25/2023-15:50:10] [I] [TRT] ONNX IR version: 0.0.8
[11/25/2023-15:50:10] [I] [TRT] Opset version: 16
[11/25/2023-15:50:10] [I] [TRT] Producer name: pytorch
[11/25/2023-15:50:10] [I] [TRT] Producer version: 2.0.1
[11/25/2023-15:50:10] [I] [TRT] Domain:
[11/25/2023-15:50:10] [I] [TRT] Model version: 0
[11/25/2023-15:50:10] [I] [TRT] Doc string:
[11/25/2023-15:50:10] [I] [TRT] ----------------------------------------------------------------
[11/25/2023-15:50:10] [W] [TRT] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[11/25/2023-15:50:11] [W] [TRT] Tensor DataType is determined at build time for tensors not marked as input or output.
[11/25/2023-15:50:11] [W] [TRT] onnx2trt_utils.cpp:400: One or more weights outside the range of INT32 was clamped
[11/25/2023-15:50:11] [I] Finished parsing network model. Parse time: 0.048996
[11/25/2023-15:50:11] [I] [TRT] Graph optimization time: 0.0418149 seconds.
[11/25/2023-15:50:12] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +1450, GPU +364, now: CPU 3132, GPU 1174 (MiB)
[11/25/2023-15:50:12] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +206, GPU +58, now: CPU 3338, GPU 1232 (MiB)
[11/25/2023-15:50:12] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[11/25/2023-15:54:58] [I] [TRT] Detected 1 inputs and 4 output network tensors.
[11/25/2023-15:54:59] [I] [TRT] Total Host Persistent Memory: 413616
[11/25/2023-15:54:59] [I] [TRT] Total Device Persistent Memory: 104960
[11/25/2023-15:54:59] [I] [TRT] Total Scratch Memory: 61519872
[11/25/2023-15:54:59] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 9 MiB, GPU 512 MiB
[11/25/2023-15:54:59] [I] [TRT] [BlockAssignment] Started assigning block shifts. This will take 199 steps to complete.
[11/25/2023-15:54:59] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 37.5809ms to assign 24 blocks to 199 nodes requiring 70024704 bytes.
[11/25/2023-15:54:59] [I] [TRT] Total Activation Memory: 70022144
[11/25/2023-15:54:59] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +1, GPU +8, now: CPU 3431, GPU 1256 (MiB)
[11/25/2023-15:54:59] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 3431, GPU 1266 (MiB)
[11/25/2023-15:54:59] [W] [TRT] TensorRT encountered issues when converting weights between types and that could affect accuracy.
[11/25/2023-15:54:59] [W] [TRT] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
[11/25/2023-15:54:59] [W] [TRT] Check verbose logs for the list of affected weights.
[11/25/2023-15:54:59] [W] [TRT] - 64 weights are affected by this issue: Detected subnormal FP16 values.
[11/25/2023-15:54:59] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +6, GPU +7, now: CPU 6, GPU 7 (MiB)
[11/25/2023-15:54:59] [I] Engine built in 294.437 sec.
[11/25/2023-15:54:59] [I] [TRT] Loaded engine size: 9 MiB
[11/25/2023-15:54:59] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +10, now: CPU 2178, GPU 960 (MiB)
[11/25/2023-15:54:59] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2178, GPU 968 (MiB)
[11/25/2023-15:54:59] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +6, now: CPU 0, GPU 6 (MiB)
[11/25/2023-15:54:59] [I] Engine deserialized in 0.0442143 sec.
[11/25/2023-15:54:59] [I] [TRT] [MS] Running engine with multi stream info
[11/25/2023-15:54:59] [I] [TRT] [MS] Number of aux streams is 3
[11/25/2023-15:54:59] [I] [TRT] [MS] Number of total worker streams is 4
[11/25/2023-15:54:59] [I] [TRT] [MS] The main stream provided by execute/enqueue calls is the first worker stream
[11/25/2023-15:54:59] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2178, GPU 960 (MiB)
[11/25/2023-15:54:59] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2178, GPU 968 (MiB)
[11/25/2023-15:54:59] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +67, now: CPU 0, GPU 73 (MiB)
[11/25/2023-15:54:59] [W] [TRT] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See “Lazy Loading” section of CUDA documentation CUDA C++ Programming Guide
[11/25/2023-15:54:59] [I] Setting persistentCacheLimit to 0 bytes.
[11/25/2023-15:54:59] [I] Using random values for input input
[11/25/2023-15:54:59] [I] Input binding for input with dimensions 1x3x640x640 is created.
[11/25/2023-15:54:59] [I] Output binding for boxes with dimensions 1x100x4 is created.
[11/25/2023-15:54:59] [I] Output binding for scores with dimensions 1x100x1 is created.
[11/25/2023-15:54:59] [I] Output binding for classes with dimensions 1x100x1 is created.
[11/25/2023-15:54:59] [I] Output binding for masks with dimensions 1x100x160x160 is created.
[11/25/2023-15:54:59] [I] Starting inference
[11/25/2023-15:54:59] [E] Error[1]: [runner.cpp::shapeChangeHelper::621] Error Code 1: Myelin (Division by 0 detected in the shape graph. Tensor (Divisor) “sp__mye3” is equal to 0.; )
[11/25/2023-15:54:59] [E] Error occurred during inference
&&&& FAILED TensorRT.trtexec [TensorRT v8601] # /usr/src/tensorrt/bin/trtexec --fp16 --onnx=/home/divya/Documents/DeepStream-Yolo-Seg/utils/pothole_new.onnx --saveEngine=ds62.engine --minShapes=input:1x3x640x640 --optShapes=input:1x3x640x640 --maxShapes=input:1x3x640x640 --shapes=input:1x3x640x640 --workspace=10000

as the log shown, trtexec conversion has this error. it is because “input shape dimension contains 0, trt generates a myelin op whose divisor equals 0.”.
trt will fix this issue in a later version.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.