Yolov8seg giving divide by 0 errors if no detection in frame

Hardware Platform (Jetson / GPU) GPU 3090
• DeepStream Version 6.2
• NVIDIA GPU Driver Version (valid for GPU only) 535.54.03
• Issue Type( questions, new requirements, bugs) bugs

Hi, i’m testing Yolov8 segmentation inference on a video stream.
The inference runs fine without errors if there are detections for all frames in the video,
But if there are no detections, then it throws a “Myelin (Division by 0 detected in the shape graph. Tensor (Divisor) “sp__mye3” is equal to 0.;” error.
I used the the onnx converter from GitHub - marcoslucianops/DeepStream-Yolo-Seg: NVIDIA DeepStream SDK 6.3 / 6.2 / 6.1.1 / 6.1 / 6.0.1 / 6.0 implementation for YOLO-Segmentation models to convert from pytorch to onnx. I’ve posted my issue on that repo too, since i’m not sure what’s causing the issue yet.

My cuda and deepstream versions are:

deepstream-app version 6.2.0
DeepStreamSDK 6.2.0
CUDA Driver Version: 12.2
CUDA Runtime Version: 12.2
TensorRT Version: 8.6
cuDNN Version: 8.9
libNVWarp360 Version: 2.0.1d3

My GPU is a 3090.

The errors for the inference are below :

(deepstream6) rama@rama-Alienware-Aurora-R13:~/Documents/code-repo/github/DeepStream-Yolo-Seg$ deepstream-app -c deepstream_app_config_wodonga.txt
0:00:03.642510434 8488 0x5556f29b3d30 INFO nvinfer gstnvinfer.cpp:680:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1909> [UID = 1]: deserialized trt engine from :/home/rama/Documents/code-repo/codecommit/edgeai-deepstream-dev_weight_estimation/edgeai-deepstream/models/yolov8m_coggan_segmentation_5epochs_16102023.onnx_b1_gpu0_fp16.engine
WARNING: [TRT]: The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
INFO: …/nvdsinfer/nvdsinfer_model_builder.cpp:610 [Implicit Engine Info]: layers num: 5
0 INPUT kFLOAT input 3x640x640
1 OUTPUT kFLOAT boxes 100x4
2 OUTPUT kFLOAT scores 100x1
3 OUTPUT kFLOAT classes 100x1
4 OUTPUT kFLOAT masks 100x160x160

0:00:03.742116568 8488 0x5556f29b3d30 INFO nvinfer gstnvinfer.cpp:680:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2012> [UID = 1]: Use deserialized engine model: /home/rama/Documents/code-repo/codecommit/edgeai-deepstream-dev_weight_estimation/edgeai-deepstream/models/yolov8m_coggan_segmentation_5epochs_16102023.onnx_b1_gpu0_fp16.engine
0:00:03.748509282 8488 0x5556f29b3d30 INFO nvinfer gstnvinfer_impl.cpp:328:notifyLoadModelStatus:<primary_gie> [UID 1]: Load new model:/home/rama/Documents/code-repo/github/DeepStream-Yolo-Seg/config_infer_primary_yoloV8_seg_coggan.txt sucessfully

Runtime commands:
h: Print this help
q: Quit

    p: Pause
    r: Resume

NOTE: To expand a source in the 2D tiled display and view object details, left-click on the source.
To go back to the tiled display, right-click anywhere on the window.

** INFO: <bus_callback:239>: Pipeline ready

(deepstream-app:8488): GStreamer-WARNING **: 16:44:42.855: (…/gst/gstinfo.c:556):gst_debug_log_valist: runtime check failed: (object == NULL || G_IS_OBJECT (object))
** INFO: <bus_callback:225>: Pipeline running

ERROR: [TRT]: 1: [runner.cpp::shapeChangeHelper::621] Error Code 1: Myelin (Division by 0 detected in the shape graph. Tensor (Divisor) “sp__mye3” is equal to 0.; )
ERROR: nvdsinfer_backend.cpp:506 Failed to enqueue trt inference batch
ERROR: nvdsinfer_context_impl.cpp:1650 Infer context enqueue buffer failed, nvinfer error:NVDSINFER_TENSORRT_ERROR
0:00:04.063957384 8488 0x5556f31f8000 WARN nvinfer gstnvinfer.cpp:1388:gst_nvinfer_input_queue_loop:<primary_gie> error: Failed to queue input batch for inferencing
ERROR from primary_gie: Failed to queue input batch for inferencing
Debug info: gstnvinfer.cpp(1388): gst_nvinfer_input_queue_loop (): /GstPipeline:pipeline/GstBin:primary_gie_bin/GstNvInfer:primary_gie
Quitting
ERROR: [TRT]: 1: [runner.cpp::shapeChangeHelper::621] Error Code 1: Myelin (Division by 0 detected in the shape graph. Tensor (Divisor) “sp__mye3” is equal to 0.; )
ERROR: nvdsinfer_backend.cpp:506 Failed to enqueue trt inference batch
ERROR: nvdsinfer_context_impl.cpp:1650 Infer context enqueue buffer failed, nvinfer error:NVDSINFER_TENSORRT_ERROR
0:00:04.077131207 8488 0x5556f31f8000 WARN nvinfer gstnvinfer.cpp:1388:gst_nvinfer_input_queue_loop:<primary_gie> error: Failed to queue input batch for inferencing
nvstreammux: Successfully handled EOS for source_id=0
ERROR: [TRT]: 1: [runner.cpp::shapeChangeHelper::621] Error Code 1: Myelin (Division by 0 detected in the shape graph. Tensor (Divisor) “sp__mye3” is equal to 0.; )
ERROR: nvdsinfer_backend.cpp:506 Failed to enqueue trt inference batch
ERROR: nvdsinfer_context_impl.cpp:1650 Infer context enqueue buffer failed, nvinfer error:NVDSINFER_TENSORRT_ERROR
0:00:04.113251295 8488 0x5556f31f8000 WARN nvinfer gstnvinfer.cpp:1388:gst_nvinfer_input_queue_loop:<primary_gie> error: Failed to queue input batch for inferencing
ERROR: [TRT]: 1: [runner.cpp::shapeChangeHelper::621] Error Code 1: Myelin (Division by 0 detected in the shape graph. Tensor (Divisor) “sp__mye3” is equal to 0.; )
ERROR: nvdsinfer_backend.cpp:506 Failed to enqueue trt inference batch
ERROR: nvdsinfer_context_impl.cpp:1650 Infer context enqueue buffer failed, nvinfer error:NVDSINFER_TENSORRT_ERROR
0:00:04.129596877 8488 0x5556f31f8000 WARN nvinfer gstnvinfer.cpp:1388:gst_nvinfer_input_queue_loop:<primary_gie> error: Failed to queue input batch for inferencing
ERROR: [TRT]: 1: [runner.cpp::shapeChangeHelper::621] Error Code 1: Myelin (Division by 0 detected in the shape graph. Tensor (Divisor) “sp__mye3” is equal to 0.; )
ERROR: nvdsinfer_backend.cpp:506 Failed to enqueue trt inference batch
ERROR: nvdsinfer_context_impl.cpp:1650 Infer context enqueue buffer failed, nvinfer error:NVDSINFER_TENSORRT_ERROR
0:00:04.142843437 8488 0x5556f31f8000 WARN nvinfer gstnvinfer.cpp:1388:gst_nvinfer_input_queue_loop:<primary_gie> error: Failed to queue input batch for inferencing
ERROR: [TRT]: 1: [runner.cpp::shapeChangeHelper::621] Error Code 1: Myelin (Division by 0 detected in the shape graph. Tensor (Divisor) “sp__mye3” is equal to 0.; )
ERROR: nvdsinfer_backend.cpp:506 Failed to enqueue trt inference batch
ERROR: nvdsinfer_context_impl.cpp:1650 Infer context enqueue buffer failed, nvinfer error:NVDSINFER_TENSORRT_ERROR
0:00:04.163171861 8488 0x5556f31f8000 WARN nvinfer gstnvinfer.cpp:1388:gst_nvinfer_input_queue_loop:<primary_gie> error: Failed to queue input batch for inferencing
ERROR from primary_gie: Failed to queue input batch for inferencing
Debug info: gstnvinfer.cpp(1388): gst_nvinfer_input_queue_loop (): /GstPipeline:pipeline/GstBin:primary_gie_bin/GstNvInfer:primary_gie
ERROR from primary_gie: Failed to queue input batch for inferencing
Debug info: gstnvinfer.cpp(1388): gst_nvinfer_input_queue_loop (): /GstPipeline:pipeline/GstBin:primary_gie_bin/GstNvInfer:primary_gie
ERROR from primary_gie: Failed to queue input batch for inferencing
Debug info: gstnvinfer.cpp(1388): gst_nvinfer_input_queue_loop (): /GstPipeline:pipeline/GstBin:primary_gie_bin/GstNvInfer:primary_gie
ERROR from primary_gie: Failed to queue input batch for inferencing
Debug info: gstnvinfer.cpp(1388): gst_nvinfer_input_queue_loop (): /GstPipeline:pipeline/GstBin:primary_gie_bin/GstNvInfer:primary_gie
ERROR from primary_gie: Failed to queue input batch for inferencing
Debug info: gstnvinfer.cpp(1388): gst_nvinfer_input_queue_loop (): /GstPipeline:pipeline/GstBin:primary_gie_bin/GstNvInfer:primary_gie
App run failed

could you share the model? Thanks! when I execute “python3 export_yoloV8_seg.py -w yolov8s-seg.pt --dynamic”, there is an error “ModuleNotFoundError: No module named ‘ultralytics.yolo’”.

Hi,

I’ve uploaded the onnx yolov8s-seg model here …

The video i was testing on is over here …

can you run again? I also see this error “Error Code 1: Myelin (Division by 0 detected in the shape graph. Tensor (Divisor) “sp__mye3” is equal to 0.; )” at the first run. when I run it again, it run well. please refer to the test log.
runlog.txt (3.9 KB)

Hi, I still get the error on my side even after running the video file multiple times.
Tried uninstalling Deepstream 6.2 and installing 6.3, but still got the same error.
I’ll try with other tensorrt versions and see if it fixes the problem.

Can you get any success?

I have the same error at conversion onnx to trt in docker image nvcr.io/nvidia/tensorrt:23.09

I got the inference to work using the same model and video on a container in an AGX Orin running nvcr.io/nvidia/deepstream-l4t:6.2-base .

  1. do you mean there is no "Error Code 1: Myelin " error on ds6.2? to rule out the model’s issue, could you share the deepstream running log? Thanks! plesae delete the engine first.
  2. could you also share the result of this commnad-line:
    /usr/src/tensorrt/bin/trtexec --fp16 --onnx=yolov8s-seg.onnx --saveEngine=ds62.engine --minShapes=input:1x3x640x640 --optShapes=input:4x3x640x640 --maxShapes=input:4x3x640x640 --shapes=input:4x3x640x640 --workspace=10000

1. Yes , DS6.2 works , but not on my default installation which has Tensorrt 8.6.1.6 and cuda 12.2.
I tried it on another container, a dgpu container this time (nvcr.io/nvidia/deepstream:6.2-devel) , which has Tensorrt 8.5.2-1 and cuda 11.8. The inference worked

The deepstream log was :

root@rama-Alienware-Aurora-R13:/opt/DeepStream-Yolo-Seg# deepstream-app -c deepstream_app_config.txt
WARNING: …/nvdsinfer/nvdsinfer_model_builder.cpp:1487 Deserialize engine failed because file path: /opt/DeepStream-Yolo-Seg/yolov8s-seg.onnx_b1_gpu0_fp16.engine open error
0:00:01.384047398 563 0x5655115fcaf0 WARN nvinfer gstnvinfer.cpp:677:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1897> [UID = 1]: deserialize engine from file :/opt/DeepStream-Yolo-Seg/yolov8s-seg.onnx_b1_gpu0_fp16.engine failed
0:00:01.433187162 563 0x5655115fcaf0 WARN nvinfer gstnvinfer.cpp:677:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2002> [UID = 1]: deserialize backend context from engine from file :/opt/DeepStream-Yolo-Seg/yolov8s-seg.onnx_b1_gpu0_fp16.engine failed, try rebuild
0:00:01.433204091 563 0x5655115fcaf0 INFO nvinfer gstnvinfer.cpp:680:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1923> [UID = 1]: Trying to create engine from model files
WARNING: [TRT]: onnx2trt_utils.cpp:377: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
WARNING: [TRT]: Tensor DataType is determined at build time for tensors not marked as input or output.
WARNING: [TRT]: onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
WARNING: [TRT]: onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
WARNING: [TRT]: onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
WARNING: [TRT]: Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
WARNING: [TRT]: Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
WARNING: [TRT]: Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
WARNING: [TRT]: Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
WARNING: [TRT]: Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
WARNING: [TRT]: Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
WARNING: [TRT]: Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
WARNING: [TRT]: Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
WARNING: [TRT]: Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
WARNING: [TRT]: Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
WARNING: [TRT]: Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
WARNING: [TRT]: Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
WARNING: [TRT]: TensorRT encountered issues when converting weights between types and that could affect accuracy.
WARNING: [TRT]: If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
WARNING: [TRT]: Check verbose logs for the list of affected weights.
WARNING: [TRT]: - 69 weights are affected by this issue: Detected subnormal FP16 values.
0:05:03.152258347 563 0x5655115fcaf0 INFO nvinfer gstnvinfer.cpp:680:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1955> [UID = 1]: serialize cuda engine to file: /opt/DeepStream-Yolo-Seg/yolov8s-seg.onnx_b1_gpu0_fp16.engine successfully
WARNING: [TRT]: The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
INFO: …/nvdsinfer/nvdsinfer_model_builder.cpp:610 [Implicit Engine Info]: layers num: 5
0 INPUT kFLOAT input 3x640x640
1 OUTPUT kFLOAT boxes 100x4
2 OUTPUT kFLOAT scores 100x1
3 OUTPUT kFLOAT classes 100x1
4 OUTPUT kFLOAT masks 100x160x160

0:05:03.279918826 563 0x5655115fcaf0 INFO nvinfer gstnvinfer_impl.cpp:328:notifyLoadModelStatus:<primary_gie> [UID 1]: Load new model:/opt/DeepStream-Yolo-Seg/config_infer_primary_yoloV8_seg.txt sucessfully

Runtime commands:
h: Print this help
q: Quit

p: Pause
r: Resume

NOTE: To expand a source in the 2D tiled display and view object details, left-click on the source.
To go back to the tiled display, right-click anywhere on the window.

**PERF: FPS 0 (Avg)
**PERF: 0.00 (0.00)
** INFO: <bus_callback:239>: Pipeline ready

WARNING from src_elem: No decoder available for type ‘audio/mpeg, mpegversion=(int)4, framed=(boolean)true, stream-format=(string)raw, level=(string)2, base-profile=(string)lc, profile=(string)lc, codec_data=(buffer)119056e500, rate=(int)48000, channels=(int)2’.
Debug info: gsturidecodebin.c(920): unknown_type_cb (): /GstPipeline:pipeline/GstBin:multi_src_bin/GstBin:src_sub_bin0/GstURIDecodeBin:src_elem
** INFO: <bus_callback:225>: Pipeline running

nvstreammux: Successfully handled EOS for source_id=0
** INFO: <bus_callback:262>: Received EOS. Exiting …

Quitting
App run successful

2. The trtexec failed to run with the batch size 4 example given , it works if batch size is 1 for all
Output for batch size 4 was :

root@rama-Alienware-Aurora-R13:/opt/DeepStream-Yolo-Seg# /usr/src/tensorrt/bin/trtexec --fp16 --onnx=yolov8s-seg.onnx --saveEngine=ds62.engine --minShapes=input:1x3x640x640 --optShapes=input:4x3x640x640 --maxShapes=input:4x3x640x640 --shapes=input:4x3x640x640 --workspace=10000
&&&& RUNNING TensorRT.trtexec [TensorRT v8502] # /usr/src/tensorrt/bin/trtexec --fp16 --onnx=yolov8s-seg.onnx --saveEngine=ds62.engine --minShapes=input:1x3x640x640 --optShapes=input:4x3x640x640 --maxShapes=input:4x3x640x640 --shapes=input:4x3x640x640 --workspace=10000
[11/02/2023-03:54:57] [W] --workspace flag has been deprecated by --memPoolSize flag.
[11/02/2023-03:54:57] [I] === Model Options ===
[11/02/2023-03:54:57] [I] Format: ONNX
[11/02/2023-03:54:57] [I] Model: yolov8s-seg.onnx
[11/02/2023-03:54:57] [I] Output:
[11/02/2023-03:54:57] [I] === Build Options ===
[11/02/2023-03:54:57] [I] Max batch: explicit batch
[11/02/2023-03:54:57] [I] Memory Pools: workspace: 10000 MiB, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[11/02/2023-03:54:57] [I] minTiming: 1
[11/02/2023-03:54:57] [I] avgTiming: 8
[11/02/2023-03:54:57] [I] Precision: FP32+FP16
[11/02/2023-03:54:57] [I] LayerPrecisions:
[11/02/2023-03:54:57] [I] Calibration:
[11/02/2023-03:54:57] [I] Refit: Disabled
[11/02/2023-03:54:57] [I] Sparsity: Disabled
[11/02/2023-03:54:57] [I] Safe mode: Disabled
[11/02/2023-03:54:57] [I] DirectIO mode: Disabled
[11/02/2023-03:54:57] [I] Restricted mode: Disabled
[11/02/2023-03:54:57] [I] Build only: Disabled
[11/02/2023-03:54:57] [I] Save engine: ds62.engine
[11/02/2023-03:54:57] [I] Load engine:
[11/02/2023-03:54:57] [I] Profiling verbosity: 0
[11/02/2023-03:54:57] [I] Tactic sources: Using default tactic sources
[11/02/2023-03:54:57] [I] timingCacheMode: local
[11/02/2023-03:54:57] [I] timingCacheFile:
[11/02/2023-03:54:57] [I] Heuristic: Disabled
[11/02/2023-03:54:57] [I] Preview Features: Use default preview flags.
[11/02/2023-03:54:57] [I] Input(s)s format: fp32:CHW
[11/02/2023-03:54:57] [I] Output(s)s format: fp32:CHW
[11/02/2023-03:54:57] [I] Input build shape: input=1x3x640x640+4x3x640x640+4x3x640x640
[11/02/2023-03:54:57] [I] Input calibration shapes: model
[11/02/2023-03:54:57] [I] === System Options ===
[11/02/2023-03:54:57] [I] Device: 0
[11/02/2023-03:54:57] [I] DLACore:
[11/02/2023-03:54:57] [I] Plugins:
[11/02/2023-03:54:57] [I] === Inference Options ===
[11/02/2023-03:54:57] [I] Batch: Explicit
[11/02/2023-03:54:57] [I] Input inference shape: input=4x3x640x640
[11/02/2023-03:54:57] [I] Iterations: 10
[11/02/2023-03:54:57] [I] Duration: 3s (+ 200ms warm up)
[11/02/2023-03:54:57] [I] Sleep time: 0ms
[11/02/2023-03:54:57] [I] Idle time: 0ms
[11/02/2023-03:54:57] [I] Streams: 1
[11/02/2023-03:54:57] [I] ExposeDMA: Disabled
[11/02/2023-03:54:57] [I] Data transfers: Enabled
[11/02/2023-03:54:57] [I] Spin-wait: Disabled
[11/02/2023-03:54:57] [I] Multithreading: Disabled
[11/02/2023-03:54:57] [I] CUDA Graph: Disabled
[11/02/2023-03:54:57] [I] Separate profiling: Disabled
[11/02/2023-03:54:57] [I] Time Deserialize: Disabled
[11/02/2023-03:54:57] [I] Time Refit: Disabled
[11/02/2023-03:54:57] [I] NVTX verbosity: 0
[11/02/2023-03:54:57] [I] Persistent Cache Ratio: 0
[11/02/2023-03:54:57] [I] Inputs:
[11/02/2023-03:54:57] [I] === Reporting Options ===
[11/02/2023-03:54:57] [I] Verbose: Disabled
[11/02/2023-03:54:57] [I] Averages: 10 inferences
[11/02/2023-03:54:57] [I] Percentiles: 90,95,99
[11/02/2023-03:54:57] [I] Dump refittable layers:Disabled
[11/02/2023-03:54:57] [I] Dump output: Disabled
[11/02/2023-03:54:57] [I] Profile: Disabled
[11/02/2023-03:54:57] [I] Export timing to JSON file:
[11/02/2023-03:54:57] [I] Export output to JSON file:
[11/02/2023-03:54:57] [I] Export profile to JSON file:
[11/02/2023-03:54:57] [I]
[11/02/2023-03:54:57] [I] === Device Information ===
[11/02/2023-03:54:57] [I] Selected Device: NVIDIA GeForce RTX 3090
[11/02/2023-03:54:57] [I] Compute Capability: 8.6
[11/02/2023-03:54:57] [I] SMs: 82
[11/02/2023-03:54:57] [I] Compute Clock Rate: 1.695 GHz
[11/02/2023-03:54:57] [I] Device Global Memory: 24251 MiB
[11/02/2023-03:54:57] [I] Shared Memory per SM: 100 KiB
[11/02/2023-03:54:57] [I] Memory Bus Width: 384 bits (ECC disabled)
[11/02/2023-03:54:57] [I] Memory Clock Rate: 9.751 GHz
[11/02/2023-03:54:57] [I]
[11/02/2023-03:54:57] [I] TensorRT version: 8.5.2
[11/02/2023-03:54:57] [I] [TRT] [MemUsageChange] Init CUDA: CPU +14, GPU +0, now: CPU 27, GPU 1031 (MiB)
[11/02/2023-03:54:58] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +547, GPU +118, now: CPU 628, GPU 1149 (MiB)
[11/02/2023-03:54:58] [I] Start parsing network model
[11/02/2023-03:54:58] [I] [TRT] ----------------------------------------------------------------
[11/02/2023-03:54:58] [I] [TRT] Input filename: yolov8s-seg.onnx
[11/02/2023-03:54:58] [I] [TRT] ONNX IR version: 0.0.8
[11/02/2023-03:54:58] [I] [TRT] Opset version: 16
[11/02/2023-03:54:58] [I] [TRT] Producer name: pytorch
[11/02/2023-03:54:58] [I] [TRT] Producer version: 2.0.0
[11/02/2023-03:54:58] [I] [TRT] Domain:
[11/02/2023-03:54:58] [I] [TRT] Model version: 0
[11/02/2023-03:54:58] [I] [TRT] Doc string:
[11/02/2023-03:54:58] [I] [TRT] ----------------------------------------------------------------
[11/02/2023-03:54:58] [W] [TRT] onnx2trt_utils.cpp:377: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[11/02/2023-03:54:58] [W] [TRT] Tensor DataType is determined at build time for tensors not marked as input or output.
[11/02/2023-03:54:58] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[11/02/2023-03:54:59] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[11/02/2023-03:54:59] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[11/02/2023-03:54:59] [I] Finish parsing network model
[11/02/2023-03:54:59] [W] [TRT] /1/Reshape_12: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 1 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +6, GPU +10, now: CPU 691, GPU 1159 (MiB)
[11/02/2023-03:54:59] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +2, GPU +10, now: CPU 693, GPU 1169 (MiB)
[11/02/2023-03:54:59] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[11/02/2023-03:54:59] [E] Error[4]: [shapeCompiler.cpp::evaluateShapeChecks::1180] Error Code 4: Internal Error (kOPT values for profile 0 violate shape constraints: IShuffleLayer /1/Reshape_12: reshaping failed for tensor: /1/Expand_1_output_0 Reshape would change volume.)
[11/02/2023-03:54:59] [E] Error[2]: [builder.cpp::buildSerializedNetwork::751] Error Code 2: Internal Error (Assertion engine != nullptr failed. )
[11/02/2023-03:54:59] [E] Engine could not be created from network
[11/02/2023-03:54:59] [E] Building engine failed
[11/02/2023-03:54:59] [E] Failed to create engine from model or file.
[11/02/2023-03:54:59] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8502] # /usr/src/tensorrt/bin/trtexec --fp16 --onnx=yolov8s-seg.onnx --saveEngine=ds62.engine --minShapes=input:1x3x640x640 --optShapes=input:4x3x640x640 --maxShapes=input:4x3x640x640 --shapes=input:4x3x640x640 --workspace=10000

Output for batch size 1 is:

root@rama-Alienware-Aurora-R13:/opt/DeepStream-Yolo-Seg# /usr/src/tensorrt/bin/trtexec --fp16 --onnx=yolov8s-seg.onnx --saveEngine=ds62.engine --minShapes=input:1x3x640x640 --optShapes=input:1x3x640x640 --maxShapes=input:1x3x640x640 --shapes=input:1x3x640x640 --workspace=10000
&&&& RUNNING TensorRT.trtexec [TensorRT v8502] # /usr/src/tensorrt/bin/trtexec --fp16 --onnx=yolov8s-seg.onnx --saveEngine=ds62.engine --minShapes=input:1x3x640x640 --optShapes=input:1x3x640x640 --maxShapes=input:1x3x640x640 --shapes=input:1x3x640x640 --workspace=10000
[11/02/2023-03:56:54] [W] --workspace flag has been deprecated by --memPoolSize flag.
[11/02/2023-03:56:54] [I] === Model Options ===
[11/02/2023-03:56:54] [I] Format: ONNX
[11/02/2023-03:56:54] [I] Model: yolov8s-seg.onnx
[11/02/2023-03:56:54] [I] Output:
[11/02/2023-03:56:54] [I] === Build Options ===
[11/02/2023-03:56:54] [I] Max batch: explicit batch
[11/02/2023-03:56:54] [I] Memory Pools: workspace: 10000 MiB, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[11/02/2023-03:56:54] [I] minTiming: 1
[11/02/2023-03:56:54] [I] avgTiming: 8
[11/02/2023-03:56:54] [I] Precision: FP32+FP16
[11/02/2023-03:56:54] [I] LayerPrecisions:
[11/02/2023-03:56:54] [I] Calibration:
[11/02/2023-03:56:54] [I] Refit: Disabled
[11/02/2023-03:56:54] [I] Sparsity: Disabled
[11/02/2023-03:56:54] [I] Safe mode: Disabled
[11/02/2023-03:56:54] [I] DirectIO mode: Disabled
[11/02/2023-03:56:54] [I] Restricted mode: Disabled
[11/02/2023-03:56:54] [I] Build only: Disabled
[11/02/2023-03:56:54] [I] Save engine: ds62.engine
[11/02/2023-03:56:54] [I] Load engine:
[11/02/2023-03:56:54] [I] Profiling verbosity: 0
[11/02/2023-03:56:54] [I] Tactic sources: Using default tactic sources
[11/02/2023-03:56:54] [I] timingCacheMode: local
[11/02/2023-03:56:54] [I] timingCacheFile:
[11/02/2023-03:56:54] [I] Heuristic: Disabled
[11/02/2023-03:56:54] [I] Preview Features: Use default preview flags.
[11/02/2023-03:56:54] [I] Input(s)s format: fp32:CHW
[11/02/2023-03:56:54] [I] Output(s)s format: fp32:CHW
[11/02/2023-03:56:54] [I] Input build shape: input=1x3x640x640+1x3x640x640+1x3x640x640
[11/02/2023-03:56:54] [I] Input calibration shapes: model
[11/02/2023-03:56:54] [I] === System Options ===
[11/02/2023-03:56:54] [I] Device: 0
[11/02/2023-03:56:54] [I] DLACore:
[11/02/2023-03:56:54] [I] Plugins:
[11/02/2023-03:56:54] [I] === Inference Options ===
[11/02/2023-03:56:54] [I] Batch: Explicit
[11/02/2023-03:56:54] [I] Input inference shape: input=1x3x640x640
[11/02/2023-03:56:54] [I] Iterations: 10
[11/02/2023-03:56:54] [I] Duration: 3s (+ 200ms warm up)
[11/02/2023-03:56:54] [I] Sleep time: 0ms
[11/02/2023-03:56:54] [I] Idle time: 0ms
[11/02/2023-03:56:54] [I] Streams: 1
[11/02/2023-03:56:54] [I] ExposeDMA: Disabled
[11/02/2023-03:56:54] [I] Data transfers: Enabled
[11/02/2023-03:56:54] [I] Spin-wait: Disabled
[11/02/2023-03:56:54] [I] Multithreading: Disabled
[11/02/2023-03:56:54] [I] CUDA Graph: Disabled
[11/02/2023-03:56:54] [I] Separate profiling: Disabled
[11/02/2023-03:56:54] [I] Time Deserialize: Disabled
[11/02/2023-03:56:54] [I] Time Refit: Disabled
[11/02/2023-03:56:54] [I] NVTX verbosity: 0
[11/02/2023-03:56:54] [I] Persistent Cache Ratio: 0
[11/02/2023-03:56:54] [I] Inputs:
[11/02/2023-03:56:54] [I] === Reporting Options ===
[11/02/2023-03:56:54] [I] Verbose: Disabled
[11/02/2023-03:56:54] [I] Averages: 10 inferences
[11/02/2023-03:56:54] [I] Percentiles: 90,95,99
[11/02/2023-03:56:54] [I] Dump refittable layers:Disabled
[11/02/2023-03:56:54] [I] Dump output: Disabled
[11/02/2023-03:56:54] [I] Profile: Disabled
[11/02/2023-03:56:54] [I] Export timing to JSON file:
[11/02/2023-03:56:54] [I] Export output to JSON file:
[11/02/2023-03:56:54] [I] Export profile to JSON file:
[11/02/2023-03:56:54] [I]
[11/02/2023-03:56:54] [I] === Device Information ===
[11/02/2023-03:56:54] [I] Selected Device: NVIDIA GeForce RTX 3090
[11/02/2023-03:56:54] [I] Compute Capability: 8.6
[11/02/2023-03:56:54] [I] SMs: 82
[11/02/2023-03:56:54] [I] Compute Clock Rate: 1.695 GHz
[11/02/2023-03:56:54] [I] Device Global Memory: 24251 MiB
[11/02/2023-03:56:54] [I] Shared Memory per SM: 100 KiB
[11/02/2023-03:56:54] [I] Memory Bus Width: 384 bits (ECC disabled)
[11/02/2023-03:56:54] [I] Memory Clock Rate: 9.751 GHz
[11/02/2023-03:56:54] [I]
[11/02/2023-03:56:54] [I] TensorRT version: 8.5.2
[11/02/2023-03:56:54] [I] [TRT] [MemUsageChange] Init CUDA: CPU +14, GPU +0, now: CPU 27, GPU 1031 (MiB)
[11/02/2023-03:56:55] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +547, GPU +118, now: CPU 628, GPU 1149 (MiB)
[11/02/2023-03:56:55] [I] Start parsing network model
[11/02/2023-03:56:55] [I] [TRT] ----------------------------------------------------------------
[11/02/2023-03:56:55] [I] [TRT] Input filename: yolov8s-seg.onnx
[11/02/2023-03:56:55] [I] [TRT] ONNX IR version: 0.0.8
[11/02/2023-03:56:55] [I] [TRT] Opset version: 16
[11/02/2023-03:56:55] [I] [TRT] Producer name: pytorch
[11/02/2023-03:56:55] [I] [TRT] Producer version: 2.0.0
[11/02/2023-03:56:55] [I] [TRT] Domain:
[11/02/2023-03:56:55] [I] [TRT] Model version: 0
[11/02/2023-03:56:55] [I] [TRT] Doc string:
[11/02/2023-03:56:55] [I] [TRT] ----------------------------------------------------------------
[11/02/2023-03:56:55] [W] [TRT] onnx2trt_utils.cpp:377: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[11/02/2023-03:56:55] [W] [TRT] Tensor DataType is determined at build time for tensors not marked as input or output.
[11/02/2023-03:56:55] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[11/02/2023-03:56:55] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[11/02/2023-03:56:56] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[11/02/2023-03:56:56] [I] Finish parsing network model
[11/02/2023-03:56:56] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:56:56] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:56:56] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:56:56] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:56:56] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:56:56] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:56:56] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:56:56] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:56:56] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:56:56] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:56:56] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:56:56] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:56:56] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +6, GPU +10, now: CPU 691, GPU 1159 (MiB)
[11/02/2023-03:56:56] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +2, GPU +10, now: CPU 693, GPU 1169 (MiB)
[11/02/2023-03:56:56] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[11/02/2023-04:01:53] [I] [TRT] [GraphReduction] The approximate region cut reduction algorithm is called.
[11/02/2023-04:01:53] [I] [TRT] Total Activation Memory: 10633833472
[11/02/2023-04:01:53] [I] [TRT] Detected 1 inputs and 4 output network tensors.
[11/02/2023-04:01:53] [I] [TRT] Total Host Persistent Memory: 227680
[11/02/2023-04:01:53] [I] [TRT] Total Device Persistent Memory: 696832
[11/02/2023-04:01:53] [I] [TRT] Total Scratch Memory: 51212288
[11/02/2023-04:01:53] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 31 MiB, GPU 8350 MiB
[11/02/2023-04:01:53] [I] [TRT] [BlockAssignment] Started assigning block shifts. This will take 215 steps to complete.
[11/02/2023-04:01:53] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 34.4314ms to assign 26 blocks to 215 nodes requiring 66220544 bytes.
[11/02/2023-04:01:53] [I] [TRT] Total Activation Memory: 66220544
[11/02/2023-04:01:53] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 1495, GPU 1388 (MiB)
[11/02/2023-04:01:53] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 1495, GPU 1396 (MiB)
[11/02/2023-04:01:53] [W] [TRT] TensorRT encountered issues when converting weights between types and that could affect accuracy.
[11/02/2023-04:01:53] [W] [TRT] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
[11/02/2023-04:01:53] [W] [TRT] Check verbose logs for the list of affected weights.
[11/02/2023-04:01:53] [W] [TRT] - 69 weights are affected by this issue: Detected subnormal FP16 values.
[11/02/2023-04:01:53] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +23, GPU +23, now: CPU 23, GPU 23 (MiB)
[11/02/2023-04:01:53] [I] Engine built in 299.491 sec.
[11/02/2023-04:01:53] [I] [TRT] Loaded engine size: 25 MiB
[11/02/2023-04:01:53] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +10, now: CPU 892, GPU 1258 (MiB)
[11/02/2023-04:01:53] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 892, GPU 1266 (MiB)
[11/02/2023-04:01:53] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +23, now: CPU 0, GPU 23 (MiB)
[11/02/2023-04:01:53] [I] Engine deserialized in 0.0231797 sec.
[11/02/2023-04:01:53] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 892, GPU 1258 (MiB)
[11/02/2023-04:01:53] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 892, GPU 1266 (MiB)
[11/02/2023-04:01:53] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +64, now: CPU 0, GPU 87 (MiB)
[11/02/2023-04:01:53] [I] Setting persistentCacheLimit to 0 bytes.
[11/02/2023-04:01:53] [I] Using random values for input input
[11/02/2023-04:01:53] [I] Created input binding for input with dimensions 1x3x640x640
[11/02/2023-04:01:53] [I] Using random values for output boxes
[11/02/2023-04:01:53] [I] Created output binding for boxes with dimensions 1x100x4
[11/02/2023-04:01:53] [I] Using random values for output scores
[11/02/2023-04:01:53] [I] Created output binding for scores with dimensions 1x100x1
[11/02/2023-04:01:53] [I] Using random values for output classes
[11/02/2023-04:01:53] [I] Created output binding for classes with dimensions 1x100x1
[11/02/2023-04:01:53] [I] Using random values for output masks
[11/02/2023-04:01:53] [I] Created output binding for masks with dimensions 1x100x160x160
[11/02/2023-04:01:53] [I] Starting inference
[11/02/2023-04:01:56] [I] Warmup completed 115 queries over 200 ms
[11/02/2023-04:01:56] [I] Timing trace has 1688 queries over 3.00264 s
[11/02/2023-04:01:56] [I]
[11/02/2023-04:01:56] [I] === Trace details ===
[11/02/2023-04:01:56] [I] Trace averages of 10 runs:
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.56747 ms - Host latency: 2.21314 ms (enqueue 1.7873 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.54307 ms - Host latency: 2.18427 ms (enqueue 1.73931 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.57757 ms - Host latency: 2.22854 ms (enqueue 1.81655 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.56515 ms - Host latency: 2.21062 ms (enqueue 1.78457 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.56338 ms - Host latency: 2.20528 ms (enqueue 1.7626 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.51584 ms - Host latency: 2.16234 ms (enqueue 1.75282 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.54091 ms - Host latency: 2.18674 ms (enqueue 1.75943 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.54344 ms - Host latency: 2.18922 ms (enqueue 1.76217 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.54409 ms - Host latency: 2.1895 ms (enqueue 1.76221 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.54753 ms - Host latency: 2.19339 ms (enqueue 1.76555 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.50078 ms - Host latency: 2.14658 ms (enqueue 1.72012 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.50508 ms - Host latency: 2.15064 ms (enqueue 1.72338 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.50521 ms - Host latency: 2.15062 ms (enqueue 1.72423 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.49931 ms - Host latency: 2.14468 ms (enqueue 1.71692 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.5013 ms - Host latency: 2.14669 ms (enqueue 1.71977 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.50251 ms - Host latency: 2.14856 ms (enqueue 1.72085 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.50555 ms - Host latency: 2.15304 ms (enqueue 1.72458 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.5004 ms - Host latency: 2.14631 ms (enqueue 1.7192 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.50093 ms - Host latency: 2.14621 ms (enqueue 1.71902 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.49962 ms - Host latency: 2.14537 ms (enqueue 1.71729 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.50071 ms - Host latency: 2.14625 ms (enqueue 1.71885 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.50026 ms - Host latency: 2.14611 ms (enqueue 1.71807 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.50107 ms - Host latency: 2.14702 ms (enqueue 1.71903 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.49901 ms - Host latency: 2.14471 ms (enqueue 1.71766 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.53431 ms - Host latency: 2.18085 ms (enqueue 1.7536 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.57101 ms - Host latency: 2.2181 ms (enqueue 1.80222 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.55786 ms - Host latency: 2.19951 ms (enqueue 1.75612 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.54619 ms - Host latency: 2.19189 ms (enqueue 1.7646 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.54412 ms - Host latency: 2.18995 ms (enqueue 1.76409 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52368 ms - Host latency: 2.16932 ms (enqueue 1.75057 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.51898 ms - Host latency: 2.1686 ms (enqueue 1.75078 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.51598 ms - Host latency: 2.16191 ms (enqueue 1.73395 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.49896 ms - Host latency: 2.14463 ms (enqueue 1.71724 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.49713 ms - Host latency: 2.14265 ms (enqueue 1.71495 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.50034 ms - Host latency: 2.14632 ms (enqueue 1.71839 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.50103 ms - Host latency: 2.14669 ms (enqueue 1.72012 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.4991 ms - Host latency: 2.14467 ms (enqueue 1.7178 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.49777 ms - Host latency: 2.14384 ms (enqueue 1.7158 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.50784 ms - Host latency: 2.15565 ms (enqueue 1.72824 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.50315 ms - Host latency: 2.14845 ms (enqueue 1.72173 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.50368 ms - Host latency: 2.14942 ms (enqueue 1.72358 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.50013 ms - Host latency: 2.14577 ms (enqueue 1.71794 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.53855 ms - Host latency: 2.18411 ms (enqueue 1.75699 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52056 ms - Host latency: 2.17162 ms (enqueue 1.74729 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.53444 ms - Host latency: 2.17687 ms (enqueue 1.73158 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.54251 ms - Host latency: 2.18448 ms (enqueue 1.74099 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.62142 ms - Host latency: 2.26287 ms (enqueue 1.81839 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.53982 ms - Host latency: 2.18553 ms (enqueue 1.75852 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.53818 ms - Host latency: 2.18375 ms (enqueue 1.75752 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.54418 ms - Host latency: 2.1855 ms (enqueue 1.7389 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.5459 ms - Host latency: 2.18713 ms (enqueue 1.74152 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.54318 ms - Host latency: 2.18458 ms (enqueue 1.73838 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.54521 ms - Host latency: 2.18658 ms (enqueue 1.74188 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.54857 ms - Host latency: 2.19086 ms (enqueue 1.74408 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.54332 ms - Host latency: 2.18469 ms (enqueue 1.73932 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.53981 ms - Host latency: 2.18143 ms (enqueue 1.73535 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.56869 ms - Host latency: 2.2075 ms (enqueue 1.74045 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.49869 ms - Host latency: 2.14288 ms (enqueue 1.71851 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.50074 ms - Host latency: 2.14641 ms (enqueue 1.71957 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.50082 ms - Host latency: 2.14651 ms (enqueue 1.7193 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.50818 ms - Host latency: 2.15349 ms (enqueue 1.72734 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.50294 ms - Host latency: 2.14851 ms (enqueue 1.72297 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.5004 ms - Host latency: 2.14589 ms (enqueue 1.71921 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.49967 ms - Host latency: 2.14498 ms (enqueue 1.71813 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.50178 ms - Host latency: 2.14749 ms (enqueue 1.72041 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.50077 ms - Host latency: 2.14644 ms (enqueue 1.71936 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.50031 ms - Host latency: 2.14594 ms (enqueue 1.71918 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.49796 ms - Host latency: 2.14402 ms (enqueue 1.71858 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.49933 ms - Host latency: 2.14478 ms (enqueue 1.71843 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.49774 ms - Host latency: 2.14362 ms (enqueue 1.71624 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.5022 ms - Host latency: 2.15302 ms (enqueue 1.72377 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.50157 ms - Host latency: 2.14687 ms (enqueue 1.71948 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.50197 ms - Host latency: 2.1476 ms (enqueue 1.72059 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.50054 ms - Host latency: 2.14608 ms (enqueue 1.71959 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.50226 ms - Host latency: 2.14783 ms (enqueue 1.71997 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.50045 ms - Host latency: 2.14652 ms (enqueue 1.71779 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.50327 ms - Host latency: 2.15275 ms (enqueue 1.72323 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.50594 ms - Host latency: 2.15726 ms (enqueue 1.72666 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.49948 ms - Host latency: 2.14498 ms (enqueue 1.71864 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.49852 ms - Host latency: 2.1442 ms (enqueue 1.71769 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.50143 ms - Host latency: 2.1468 ms (enqueue 1.72061 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.57386 ms - Host latency: 2.21927 ms (enqueue 1.79906 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.59067 ms - Host latency: 2.23188 ms (enqueue 1.78713 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.58927 ms - Host latency: 2.23066 ms (enqueue 1.79908 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.63063 ms - Host latency: 2.26729 ms (enqueue 1.81172 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.61261 ms - Host latency: 2.25405 ms (enqueue 1.80938 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.60544 ms - Host latency: 2.24674 ms (enqueue 1.80078 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.61169 ms - Host latency: 2.25294 ms (enqueue 1.80745 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.63901 ms - Host latency: 2.28134 ms (enqueue 1.84021 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.62869 ms - Host latency: 2.26555 ms (enqueue 1.80115 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.60476 ms - Host latency: 2.24662 ms (enqueue 1.80186 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.62632 ms - Host latency: 2.26927 ms (enqueue 1.8324 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.61266 ms - Host latency: 2.27126 ms (enqueue 1.8548 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.58112 ms - Host latency: 2.22858 ms (enqueue 1.82648 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.58099 ms - Host latency: 2.2282 ms (enqueue 1.82659 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.61832 ms - Host latency: 2.25531 ms (enqueue 1.79891 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.60356 ms - Host latency: 2.24481 ms (enqueue 1.79991 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.60837 ms - Host latency: 2.24979 ms (enqueue 1.80468 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.6455 ms - Host latency: 2.28683 ms (enqueue 1.84213 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.60743 ms - Host latency: 2.24858 ms (enqueue 1.80402 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.61591 ms - Host latency: 2.26296 ms (enqueue 1.83188 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.96003 ms - Host latency: 2.59756 ms (enqueue 2.13203 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.67695 ms - Host latency: 2.31396 ms (enqueue 1.85105 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.62135 ms - Host latency: 2.26328 ms (enqueue 1.8183 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.63699 ms - Host latency: 2.27855 ms (enqueue 1.83374 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.56436 ms - Host latency: 2.20554 ms (enqueue 1.76013 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.83174 ms - Host latency: 2.47195 ms (enqueue 1.96165 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.70525 ms - Host latency: 2.3459 ms (enqueue 1.92439 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.69146 ms - Host latency: 2.33259 ms (enqueue 1.88896 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.70952 ms - Host latency: 2.35081 ms (enqueue 1.91487 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.61265 ms - Host latency: 2.25796 ms (enqueue 1.84934 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.61057 ms - Host latency: 2.25244 ms (enqueue 1.83516 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.61426 ms - Host latency: 2.25601 ms (enqueue 1.82349 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.64824 ms - Host latency: 2.28066 ms (enqueue 1.81912 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.59995 ms - Host latency: 2.24666 ms (enqueue 1.82932 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.60073 ms - Host latency: 2.24727 ms (enqueue 1.83508 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.62224 ms - Host latency: 2.26389 ms (enqueue 1.81895 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.72319 ms - Host latency: 2.36682 ms (enqueue 1.94055 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.65349 ms - Host latency: 2.29294 ms (enqueue 1.8688 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.64724 ms - Host latency: 2.28337 ms (enqueue 1.82039 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.6344 ms - Host latency: 2.27102 ms (enqueue 1.81765 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.65 ms - Host latency: 2.28628 ms (enqueue 1.82395 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.66318 ms - Host latency: 2.29998 ms (enqueue 1.83718 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.63496 ms - Host latency: 2.27722 ms (enqueue 1.84873 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.577 ms - Host latency: 2.21936 ms (enqueue 1.77939 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.54934 ms - Host latency: 2.19055 ms (enqueue 1.74553 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52664 ms - Host latency: 2.17288 ms (enqueue 1.74536 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52461 ms - Host latency: 2.17046 ms (enqueue 1.74426 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52202 ms - Host latency: 2.16816 ms (enqueue 1.74009 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52278 ms - Host latency: 2.16921 ms (enqueue 1.74148 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52222 ms - Host latency: 2.16882 ms (enqueue 1.74148 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52485 ms - Host latency: 2.17097 ms (enqueue 1.7408 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52312 ms - Host latency: 2.16978 ms (enqueue 1.74146 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.55859 ms - Host latency: 2.2054 ms (enqueue 1.77649 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.54426 ms - Host latency: 2.18984 ms (enqueue 1.76438 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.51702 ms - Host latency: 2.16289 ms (enqueue 1.73545 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52166 ms - Host latency: 2.16836 ms (enqueue 1.7407 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52517 ms - Host latency: 2.17185 ms (enqueue 1.7449 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52122 ms - Host latency: 2.16716 ms (enqueue 1.73938 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52109 ms - Host latency: 2.17024 ms (enqueue 1.74187 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52341 ms - Host latency: 2.16965 ms (enqueue 1.74268 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52168 ms - Host latency: 2.16826 ms (enqueue 1.74124 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.5178 ms - Host latency: 2.16394 ms (enqueue 1.73645 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52031 ms - Host latency: 2.1676 ms (enqueue 1.73975 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52109 ms - Host latency: 2.16772 ms (enqueue 1.73977 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52126 ms - Host latency: 2.17004 ms (enqueue 1.73948 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52305 ms - Host latency: 2.16943 ms (enqueue 1.74248 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52263 ms - Host latency: 2.16824 ms (enqueue 1.74216 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.5217 ms - Host latency: 2.16743 ms (enqueue 1.7415 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52222 ms - Host latency: 2.16785 ms (enqueue 1.74114 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52517 ms - Host latency: 2.1709 ms (enqueue 1.74446 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52285 ms - Host latency: 2.16843 ms (enqueue 1.74292 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52021 ms - Host latency: 2.16594 ms (enqueue 1.73857 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52188 ms - Host latency: 2.16755 ms (enqueue 1.74229 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52378 ms - Host latency: 2.16938 ms (enqueue 1.74348 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52236 ms - Host latency: 2.17317 ms (enqueue 1.7488 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.54307 ms - Host latency: 2.19482 ms (enqueue 1.76543 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52966 ms - Host latency: 2.17607 ms (enqueue 1.74829 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52344 ms - Host latency: 2.17378 ms (enqueue 1.74241 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52468 ms - Host latency: 2.17317 ms (enqueue 1.74683 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52092 ms - Host latency: 2.16741 ms (enqueue 1.73943 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52339 ms - Host latency: 2.17039 ms (enqueue 1.74177 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52073 ms - Host latency: 2.16819 ms (enqueue 1.74124 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52385 ms - Host latency: 2.1698 ms (enqueue 1.74268 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.53267 ms - Host latency: 2.18018 ms (enqueue 1.75249 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52524 ms - Host latency: 2.1709 ms (enqueue 1.74404 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52498 ms - Host latency: 2.17024 ms (enqueue 1.7446 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52441 ms - Host latency: 2.16997 ms (enqueue 1.74385 ms)
[11/02/2023-04:01:56] [I]
[11/02/2023-04:01:56] [I] === Performance summary ===
[11/02/2023-04:01:56] [I] Throughput: 562.172 qps
[11/02/2023-04:01:56] [I] Latency: min = 2.1283 ms, max = 5.60583 ms, mean = 2.19671 ms, median = 2.159 ms, percentile(90%) = 2.28162 ms, percentile(95%) = 2.52026 ms, percentile(99%) = 2.75 ms
[11/02/2023-04:01:56] [I] Enqueue Time: min = 1.69995 ms, max = 4.97644 ms, mean = 1.76463 ms, median = 1.73242 ms, percentile(90%) = 1.87451 ms, percentile(95%) = 2.00977 ms, percentile(99%) = 2.26904 ms
[11/02/2023-04:01:56] [I] H2D Latency: min = 0.199707 ms, max = 0.266235 ms, mean = 0.217886 ms, median = 0.21814 ms, percentile(90%) = 0.218384 ms, percentile(95%) = 0.219727 ms, percentile(99%) = 0.223633 ms
[11/02/2023-04:01:56] [I] GPU Compute Time: min = 1.48901 ms, max = 4.98071 ms, mean = 1.55201 ms, median = 1.5135 ms, percentile(90%) = 1.63342 ms, percentile(95%) = 1.87793 ms, percentile(99%) = 2.13379 ms
[11/02/2023-04:01:56] [I] D2H Latency: min = 0.396973 ms, max = 0.484314 ms, mean = 0.426806 ms, median = 0.42749 ms, percentile(90%) = 0.428467 ms, percentile(95%) = 0.428955 ms, percentile(99%) = 0.438232 ms
[11/02/2023-04:01:56] [I] Total Host Walltime: 3.00264 s
[11/02/2023-04:01:56] [I] Total GPU Compute Time: 2.6198 s
[11/02/2023-04:01:56] [W] * Throughput may be bound by Enqueue Time rather than GPU Compute and the GPU may be under-utilized.
[11/02/2023-04:01:56] [W] If not already in use, --useCudaGraph (utilize CUDA graphs where possible) may increase the throughput.
[11/02/2023-04:01:56] [W] * GPU compute time is unstable, with coefficient of variance = 10.0717%.
[11/02/2023-04:01:56] [W] If not already in use, locking GPU clock frequency or adding --useSpinWait may improve the stability.
[11/02/2023-04:01:56] [I] Explanations of the performance metrics are printed in the verbose logs.
[11/02/2023-04:01:56] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v8502] # /usr/src/tensorrt/bin/trtexec --fp16 --onnx=yolov8s-seg.onnx --saveEngine=ds62.engine --minShapes=input:1x3x640x640 --optShapes=input:1x3x640x640 --maxShapes=input:1x3x640x640 --shapes=input:1x3x640x640 --workspace=10000

thanks for the sharing. it should be tensorrt issue because trtexec command-line also has problem. we will investigate.

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

after testing in docker nvcr.io/nvidia/deepstream:6.3-triton-multiarch with the model you shared on Oct 27, I can’t reproduce that " [TRT]: 1: [runner.cpp::shapeChangeHelper::621] Error Code 1: Myelin (Division by…" error.
please refer the log:
63-1107.txt (5.5 KB)

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.