1. Yes , DS6.2 works , but not on my default installation which has Tensorrt 8.6.1.6 and cuda 12.2.
I tried it on another container, a dgpu container this time (nvcr.io/nvidia/deepstream:6.2-devel) , which has Tensorrt 8.5.2-1 and cuda 11.8. The inference worked
The deepstream log was :
root@rama-Alienware-Aurora-R13:/opt/DeepStream-Yolo-Seg# deepstream-app -c deepstream_app_config.txt
WARNING: …/nvdsinfer/nvdsinfer_model_builder.cpp:1487 Deserialize engine failed because file path: /opt/DeepStream-Yolo-Seg/yolov8s-seg.onnx_b1_gpu0_fp16.engine open error
0:00:01.384047398 563 0x5655115fcaf0 WARN nvinfer gstnvinfer.cpp:677:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1897> [UID = 1]: deserialize engine from file :/opt/DeepStream-Yolo-Seg/yolov8s-seg.onnx_b1_gpu0_fp16.engine failed
0:00:01.433187162 563 0x5655115fcaf0 WARN nvinfer gstnvinfer.cpp:677:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2002> [UID = 1]: deserialize backend context from engine from file :/opt/DeepStream-Yolo-Seg/yolov8s-seg.onnx_b1_gpu0_fp16.engine failed, try rebuild
0:00:01.433204091 563 0x5655115fcaf0 INFO nvinfer gstnvinfer.cpp:680:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1923> [UID = 1]: Trying to create engine from model files
WARNING: [TRT]: onnx2trt_utils.cpp:377: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
WARNING: [TRT]: Tensor DataType is determined at build time for tensors not marked as input or output.
WARNING: [TRT]: onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
WARNING: [TRT]: onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
WARNING: [TRT]: onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
WARNING: [TRT]: Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
WARNING: [TRT]: Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
WARNING: [TRT]: Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
WARNING: [TRT]: Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
WARNING: [TRT]: Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
WARNING: [TRT]: Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
WARNING: [TRT]: Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
WARNING: [TRT]: Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
WARNING: [TRT]: Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
WARNING: [TRT]: Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
WARNING: [TRT]: Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
WARNING: [TRT]: Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
WARNING: [TRT]: TensorRT encountered issues when converting weights between types and that could affect accuracy.
WARNING: [TRT]: If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
WARNING: [TRT]: Check verbose logs for the list of affected weights.
WARNING: [TRT]: - 69 weights are affected by this issue: Detected subnormal FP16 values.
0:05:03.152258347 563 0x5655115fcaf0 INFO nvinfer gstnvinfer.cpp:680:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1955> [UID = 1]: serialize cuda engine to file: /opt/DeepStream-Yolo-Seg/yolov8s-seg.onnx_b1_gpu0_fp16.engine successfully
WARNING: [TRT]: The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
INFO: …/nvdsinfer/nvdsinfer_model_builder.cpp:610 [Implicit Engine Info]: layers num: 5
0 INPUT kFLOAT input 3x640x640
1 OUTPUT kFLOAT boxes 100x4
2 OUTPUT kFLOAT scores 100x1
3 OUTPUT kFLOAT classes 100x1
4 OUTPUT kFLOAT masks 100x160x160
0:05:03.279918826 563 0x5655115fcaf0 INFO nvinfer gstnvinfer_impl.cpp:328:notifyLoadModelStatus:<primary_gie> [UID 1]: Load new model:/opt/DeepStream-Yolo-Seg/config_infer_primary_yoloV8_seg.txt sucessfully
Runtime commands:
h: Print this help
q: Quit
p: Pause
r: Resume
NOTE: To expand a source in the 2D tiled display and view object details, left-click on the source.
To go back to the tiled display, right-click anywhere on the window.
**PERF: FPS 0 (Avg)
**PERF: 0.00 (0.00)
** INFO: <bus_callback:239>: Pipeline ready
WARNING from src_elem: No decoder available for type ‘audio/mpeg, mpegversion=(int)4, framed=(boolean)true, stream-format=(string)raw, level=(string)2, base-profile=(string)lc, profile=(string)lc, codec_data=(buffer)119056e500, rate=(int)48000, channels=(int)2’.
Debug info: gsturidecodebin.c(920): unknown_type_cb (): /GstPipeline:pipeline/GstBin:multi_src_bin/GstBin:src_sub_bin0/GstURIDecodeBin:src_elem
** INFO: <bus_callback:225>: Pipeline running
nvstreammux: Successfully handled EOS for source_id=0
** INFO: <bus_callback:262>: Received EOS. Exiting …
Quitting
App run successful
2. The trtexec failed to run with the batch size 4 example given , it works if batch size is 1 for all
Output for batch size 4 was :
root@rama-Alienware-Aurora-R13:/opt/DeepStream-Yolo-Seg# /usr/src/tensorrt/bin/trtexec --fp16 --onnx=yolov8s-seg.onnx --saveEngine=ds62.engine --minShapes=input:1x3x640x640 --optShapes=input:4x3x640x640 --maxShapes=input:4x3x640x640 --shapes=input:4x3x640x640 --workspace=10000
&&&& RUNNING TensorRT.trtexec [TensorRT v8502] # /usr/src/tensorrt/bin/trtexec --fp16 --onnx=yolov8s-seg.onnx --saveEngine=ds62.engine --minShapes=input:1x3x640x640 --optShapes=input:4x3x640x640 --maxShapes=input:4x3x640x640 --shapes=input:4x3x640x640 --workspace=10000
[11/02/2023-03:54:57] [W] --workspace flag has been deprecated by --memPoolSize flag.
[11/02/2023-03:54:57] [I] === Model Options ===
[11/02/2023-03:54:57] [I] Format: ONNX
[11/02/2023-03:54:57] [I] Model: yolov8s-seg.onnx
[11/02/2023-03:54:57] [I] Output:
[11/02/2023-03:54:57] [I] === Build Options ===
[11/02/2023-03:54:57] [I] Max batch: explicit batch
[11/02/2023-03:54:57] [I] Memory Pools: workspace: 10000 MiB, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[11/02/2023-03:54:57] [I] minTiming: 1
[11/02/2023-03:54:57] [I] avgTiming: 8
[11/02/2023-03:54:57] [I] Precision: FP32+FP16
[11/02/2023-03:54:57] [I] LayerPrecisions:
[11/02/2023-03:54:57] [I] Calibration:
[11/02/2023-03:54:57] [I] Refit: Disabled
[11/02/2023-03:54:57] [I] Sparsity: Disabled
[11/02/2023-03:54:57] [I] Safe mode: Disabled
[11/02/2023-03:54:57] [I] DirectIO mode: Disabled
[11/02/2023-03:54:57] [I] Restricted mode: Disabled
[11/02/2023-03:54:57] [I] Build only: Disabled
[11/02/2023-03:54:57] [I] Save engine: ds62.engine
[11/02/2023-03:54:57] [I] Load engine:
[11/02/2023-03:54:57] [I] Profiling verbosity: 0
[11/02/2023-03:54:57] [I] Tactic sources: Using default tactic sources
[11/02/2023-03:54:57] [I] timingCacheMode: local
[11/02/2023-03:54:57] [I] timingCacheFile:
[11/02/2023-03:54:57] [I] Heuristic: Disabled
[11/02/2023-03:54:57] [I] Preview Features: Use default preview flags.
[11/02/2023-03:54:57] [I] Input(s)s format: fp32:CHW
[11/02/2023-03:54:57] [I] Output(s)s format: fp32:CHW
[11/02/2023-03:54:57] [I] Input build shape: input=1x3x640x640+4x3x640x640+4x3x640x640
[11/02/2023-03:54:57] [I] Input calibration shapes: model
[11/02/2023-03:54:57] [I] === System Options ===
[11/02/2023-03:54:57] [I] Device: 0
[11/02/2023-03:54:57] [I] DLACore:
[11/02/2023-03:54:57] [I] Plugins:
[11/02/2023-03:54:57] [I] === Inference Options ===
[11/02/2023-03:54:57] [I] Batch: Explicit
[11/02/2023-03:54:57] [I] Input inference shape: input=4x3x640x640
[11/02/2023-03:54:57] [I] Iterations: 10
[11/02/2023-03:54:57] [I] Duration: 3s (+ 200ms warm up)
[11/02/2023-03:54:57] [I] Sleep time: 0ms
[11/02/2023-03:54:57] [I] Idle time: 0ms
[11/02/2023-03:54:57] [I] Streams: 1
[11/02/2023-03:54:57] [I] ExposeDMA: Disabled
[11/02/2023-03:54:57] [I] Data transfers: Enabled
[11/02/2023-03:54:57] [I] Spin-wait: Disabled
[11/02/2023-03:54:57] [I] Multithreading: Disabled
[11/02/2023-03:54:57] [I] CUDA Graph: Disabled
[11/02/2023-03:54:57] [I] Separate profiling: Disabled
[11/02/2023-03:54:57] [I] Time Deserialize: Disabled
[11/02/2023-03:54:57] [I] Time Refit: Disabled
[11/02/2023-03:54:57] [I] NVTX verbosity: 0
[11/02/2023-03:54:57] [I] Persistent Cache Ratio: 0
[11/02/2023-03:54:57] [I] Inputs:
[11/02/2023-03:54:57] [I] === Reporting Options ===
[11/02/2023-03:54:57] [I] Verbose: Disabled
[11/02/2023-03:54:57] [I] Averages: 10 inferences
[11/02/2023-03:54:57] [I] Percentiles: 90,95,99
[11/02/2023-03:54:57] [I] Dump refittable layers:Disabled
[11/02/2023-03:54:57] [I] Dump output: Disabled
[11/02/2023-03:54:57] [I] Profile: Disabled
[11/02/2023-03:54:57] [I] Export timing to JSON file:
[11/02/2023-03:54:57] [I] Export output to JSON file:
[11/02/2023-03:54:57] [I] Export profile to JSON file:
[11/02/2023-03:54:57] [I]
[11/02/2023-03:54:57] [I] === Device Information ===
[11/02/2023-03:54:57] [I] Selected Device: NVIDIA GeForce RTX 3090
[11/02/2023-03:54:57] [I] Compute Capability: 8.6
[11/02/2023-03:54:57] [I] SMs: 82
[11/02/2023-03:54:57] [I] Compute Clock Rate: 1.695 GHz
[11/02/2023-03:54:57] [I] Device Global Memory: 24251 MiB
[11/02/2023-03:54:57] [I] Shared Memory per SM: 100 KiB
[11/02/2023-03:54:57] [I] Memory Bus Width: 384 bits (ECC disabled)
[11/02/2023-03:54:57] [I] Memory Clock Rate: 9.751 GHz
[11/02/2023-03:54:57] [I]
[11/02/2023-03:54:57] [I] TensorRT version: 8.5.2
[11/02/2023-03:54:57] [I] [TRT] [MemUsageChange] Init CUDA: CPU +14, GPU +0, now: CPU 27, GPU 1031 (MiB)
[11/02/2023-03:54:58] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +547, GPU +118, now: CPU 628, GPU 1149 (MiB)
[11/02/2023-03:54:58] [I] Start parsing network model
[11/02/2023-03:54:58] [I] [TRT] ----------------------------------------------------------------
[11/02/2023-03:54:58] [I] [TRT] Input filename: yolov8s-seg.onnx
[11/02/2023-03:54:58] [I] [TRT] ONNX IR version: 0.0.8
[11/02/2023-03:54:58] [I] [TRT] Opset version: 16
[11/02/2023-03:54:58] [I] [TRT] Producer name: pytorch
[11/02/2023-03:54:58] [I] [TRT] Producer version: 2.0.0
[11/02/2023-03:54:58] [I] [TRT] Domain:
[11/02/2023-03:54:58] [I] [TRT] Model version: 0
[11/02/2023-03:54:58] [I] [TRT] Doc string:
[11/02/2023-03:54:58] [I] [TRT] ----------------------------------------------------------------
[11/02/2023-03:54:58] [W] [TRT] onnx2trt_utils.cpp:377: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[11/02/2023-03:54:58] [W] [TRT] Tensor DataType is determined at build time for tensors not marked as input or output.
[11/02/2023-03:54:58] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[11/02/2023-03:54:59] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[11/02/2023-03:54:59] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[11/02/2023-03:54:59] [I] Finish parsing network model
[11/02/2023-03:54:59] [W] [TRT] /1/Reshape_12: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 1 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:54:59] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +6, GPU +10, now: CPU 691, GPU 1159 (MiB)
[11/02/2023-03:54:59] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +2, GPU +10, now: CPU 693, GPU 1169 (MiB)
[11/02/2023-03:54:59] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[11/02/2023-03:54:59] [E] Error[4]: [shapeCompiler.cpp::evaluateShapeChecks::1180] Error Code 4: Internal Error (kOPT values for profile 0 violate shape constraints: IShuffleLayer /1/Reshape_12: reshaping failed for tensor: /1/Expand_1_output_0 Reshape would change volume.)
[11/02/2023-03:54:59] [E] Error[2]: [builder.cpp::buildSerializedNetwork::751] Error Code 2: Internal Error (Assertion engine != nullptr failed. )
[11/02/2023-03:54:59] [E] Engine could not be created from network
[11/02/2023-03:54:59] [E] Building engine failed
[11/02/2023-03:54:59] [E] Failed to create engine from model or file.
[11/02/2023-03:54:59] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8502] # /usr/src/tensorrt/bin/trtexec --fp16 --onnx=yolov8s-seg.onnx --saveEngine=ds62.engine --minShapes=input:1x3x640x640 --optShapes=input:4x3x640x640 --maxShapes=input:4x3x640x640 --shapes=input:4x3x640x640 --workspace=10000
Output for batch size 1 is:
root@rama-Alienware-Aurora-R13:/opt/DeepStream-Yolo-Seg# /usr/src/tensorrt/bin/trtexec --fp16 --onnx=yolov8s-seg.onnx --saveEngine=ds62.engine --minShapes=input:1x3x640x640 --optShapes=input:1x3x640x640 --maxShapes=input:1x3x640x640 --shapes=input:1x3x640x640 --workspace=10000
&&&& RUNNING TensorRT.trtexec [TensorRT v8502] # /usr/src/tensorrt/bin/trtexec --fp16 --onnx=yolov8s-seg.onnx --saveEngine=ds62.engine --minShapes=input:1x3x640x640 --optShapes=input:1x3x640x640 --maxShapes=input:1x3x640x640 --shapes=input:1x3x640x640 --workspace=10000
[11/02/2023-03:56:54] [W] --workspace flag has been deprecated by --memPoolSize flag.
[11/02/2023-03:56:54] [I] === Model Options ===
[11/02/2023-03:56:54] [I] Format: ONNX
[11/02/2023-03:56:54] [I] Model: yolov8s-seg.onnx
[11/02/2023-03:56:54] [I] Output:
[11/02/2023-03:56:54] [I] === Build Options ===
[11/02/2023-03:56:54] [I] Max batch: explicit batch
[11/02/2023-03:56:54] [I] Memory Pools: workspace: 10000 MiB, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[11/02/2023-03:56:54] [I] minTiming: 1
[11/02/2023-03:56:54] [I] avgTiming: 8
[11/02/2023-03:56:54] [I] Precision: FP32+FP16
[11/02/2023-03:56:54] [I] LayerPrecisions:
[11/02/2023-03:56:54] [I] Calibration:
[11/02/2023-03:56:54] [I] Refit: Disabled
[11/02/2023-03:56:54] [I] Sparsity: Disabled
[11/02/2023-03:56:54] [I] Safe mode: Disabled
[11/02/2023-03:56:54] [I] DirectIO mode: Disabled
[11/02/2023-03:56:54] [I] Restricted mode: Disabled
[11/02/2023-03:56:54] [I] Build only: Disabled
[11/02/2023-03:56:54] [I] Save engine: ds62.engine
[11/02/2023-03:56:54] [I] Load engine:
[11/02/2023-03:56:54] [I] Profiling verbosity: 0
[11/02/2023-03:56:54] [I] Tactic sources: Using default tactic sources
[11/02/2023-03:56:54] [I] timingCacheMode: local
[11/02/2023-03:56:54] [I] timingCacheFile:
[11/02/2023-03:56:54] [I] Heuristic: Disabled
[11/02/2023-03:56:54] [I] Preview Features: Use default preview flags.
[11/02/2023-03:56:54] [I] Input(s)s format: fp32:CHW
[11/02/2023-03:56:54] [I] Output(s)s format: fp32:CHW
[11/02/2023-03:56:54] [I] Input build shape: input=1x3x640x640+1x3x640x640+1x3x640x640
[11/02/2023-03:56:54] [I] Input calibration shapes: model
[11/02/2023-03:56:54] [I] === System Options ===
[11/02/2023-03:56:54] [I] Device: 0
[11/02/2023-03:56:54] [I] DLACore:
[11/02/2023-03:56:54] [I] Plugins:
[11/02/2023-03:56:54] [I] === Inference Options ===
[11/02/2023-03:56:54] [I] Batch: Explicit
[11/02/2023-03:56:54] [I] Input inference shape: input=1x3x640x640
[11/02/2023-03:56:54] [I] Iterations: 10
[11/02/2023-03:56:54] [I] Duration: 3s (+ 200ms warm up)
[11/02/2023-03:56:54] [I] Sleep time: 0ms
[11/02/2023-03:56:54] [I] Idle time: 0ms
[11/02/2023-03:56:54] [I] Streams: 1
[11/02/2023-03:56:54] [I] ExposeDMA: Disabled
[11/02/2023-03:56:54] [I] Data transfers: Enabled
[11/02/2023-03:56:54] [I] Spin-wait: Disabled
[11/02/2023-03:56:54] [I] Multithreading: Disabled
[11/02/2023-03:56:54] [I] CUDA Graph: Disabled
[11/02/2023-03:56:54] [I] Separate profiling: Disabled
[11/02/2023-03:56:54] [I] Time Deserialize: Disabled
[11/02/2023-03:56:54] [I] Time Refit: Disabled
[11/02/2023-03:56:54] [I] NVTX verbosity: 0
[11/02/2023-03:56:54] [I] Persistent Cache Ratio: 0
[11/02/2023-03:56:54] [I] Inputs:
[11/02/2023-03:56:54] [I] === Reporting Options ===
[11/02/2023-03:56:54] [I] Verbose: Disabled
[11/02/2023-03:56:54] [I] Averages: 10 inferences
[11/02/2023-03:56:54] [I] Percentiles: 90,95,99
[11/02/2023-03:56:54] [I] Dump refittable layers:Disabled
[11/02/2023-03:56:54] [I] Dump output: Disabled
[11/02/2023-03:56:54] [I] Profile: Disabled
[11/02/2023-03:56:54] [I] Export timing to JSON file:
[11/02/2023-03:56:54] [I] Export output to JSON file:
[11/02/2023-03:56:54] [I] Export profile to JSON file:
[11/02/2023-03:56:54] [I]
[11/02/2023-03:56:54] [I] === Device Information ===
[11/02/2023-03:56:54] [I] Selected Device: NVIDIA GeForce RTX 3090
[11/02/2023-03:56:54] [I] Compute Capability: 8.6
[11/02/2023-03:56:54] [I] SMs: 82
[11/02/2023-03:56:54] [I] Compute Clock Rate: 1.695 GHz
[11/02/2023-03:56:54] [I] Device Global Memory: 24251 MiB
[11/02/2023-03:56:54] [I] Shared Memory per SM: 100 KiB
[11/02/2023-03:56:54] [I] Memory Bus Width: 384 bits (ECC disabled)
[11/02/2023-03:56:54] [I] Memory Clock Rate: 9.751 GHz
[11/02/2023-03:56:54] [I]
[11/02/2023-03:56:54] [I] TensorRT version: 8.5.2
[11/02/2023-03:56:54] [I] [TRT] [MemUsageChange] Init CUDA: CPU +14, GPU +0, now: CPU 27, GPU 1031 (MiB)
[11/02/2023-03:56:55] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +547, GPU +118, now: CPU 628, GPU 1149 (MiB)
[11/02/2023-03:56:55] [I] Start parsing network model
[11/02/2023-03:56:55] [I] [TRT] ----------------------------------------------------------------
[11/02/2023-03:56:55] [I] [TRT] Input filename: yolov8s-seg.onnx
[11/02/2023-03:56:55] [I] [TRT] ONNX IR version: 0.0.8
[11/02/2023-03:56:55] [I] [TRT] Opset version: 16
[11/02/2023-03:56:55] [I] [TRT] Producer name: pytorch
[11/02/2023-03:56:55] [I] [TRT] Producer version: 2.0.0
[11/02/2023-03:56:55] [I] [TRT] Domain:
[11/02/2023-03:56:55] [I] [TRT] Model version: 0
[11/02/2023-03:56:55] [I] [TRT] Doc string:
[11/02/2023-03:56:55] [I] [TRT] ----------------------------------------------------------------
[11/02/2023-03:56:55] [W] [TRT] onnx2trt_utils.cpp:377: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[11/02/2023-03:56:55] [W] [TRT] Tensor DataType is determined at build time for tensors not marked as input or output.
[11/02/2023-03:56:55] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[11/02/2023-03:56:55] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[11/02/2023-03:56:56] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[11/02/2023-03:56:56] [I] Finish parsing network model
[11/02/2023-03:56:56] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:56:56] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:56:56] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:56:56] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:56:56] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:56:56] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:56:56] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:56:56] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:56:56] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:56:56] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:56:56] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:56:56] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[11/02/2023-03:56:56] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +6, GPU +10, now: CPU 691, GPU 1159 (MiB)
[11/02/2023-03:56:56] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +2, GPU +10, now: CPU 693, GPU 1169 (MiB)
[11/02/2023-03:56:56] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[11/02/2023-04:01:53] [I] [TRT] [GraphReduction] The approximate region cut reduction algorithm is called.
[11/02/2023-04:01:53] [I] [TRT] Total Activation Memory: 10633833472
[11/02/2023-04:01:53] [I] [TRT] Detected 1 inputs and 4 output network tensors.
[11/02/2023-04:01:53] [I] [TRT] Total Host Persistent Memory: 227680
[11/02/2023-04:01:53] [I] [TRT] Total Device Persistent Memory: 696832
[11/02/2023-04:01:53] [I] [TRT] Total Scratch Memory: 51212288
[11/02/2023-04:01:53] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 31 MiB, GPU 8350 MiB
[11/02/2023-04:01:53] [I] [TRT] [BlockAssignment] Started assigning block shifts. This will take 215 steps to complete.
[11/02/2023-04:01:53] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 34.4314ms to assign 26 blocks to 215 nodes requiring 66220544 bytes.
[11/02/2023-04:01:53] [I] [TRT] Total Activation Memory: 66220544
[11/02/2023-04:01:53] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 1495, GPU 1388 (MiB)
[11/02/2023-04:01:53] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 1495, GPU 1396 (MiB)
[11/02/2023-04:01:53] [W] [TRT] TensorRT encountered issues when converting weights between types and that could affect accuracy.
[11/02/2023-04:01:53] [W] [TRT] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
[11/02/2023-04:01:53] [W] [TRT] Check verbose logs for the list of affected weights.
[11/02/2023-04:01:53] [W] [TRT] - 69 weights are affected by this issue: Detected subnormal FP16 values.
[11/02/2023-04:01:53] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +23, GPU +23, now: CPU 23, GPU 23 (MiB)
[11/02/2023-04:01:53] [I] Engine built in 299.491 sec.
[11/02/2023-04:01:53] [I] [TRT] Loaded engine size: 25 MiB
[11/02/2023-04:01:53] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +10, now: CPU 892, GPU 1258 (MiB)
[11/02/2023-04:01:53] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 892, GPU 1266 (MiB)
[11/02/2023-04:01:53] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +23, now: CPU 0, GPU 23 (MiB)
[11/02/2023-04:01:53] [I] Engine deserialized in 0.0231797 sec.
[11/02/2023-04:01:53] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 892, GPU 1258 (MiB)
[11/02/2023-04:01:53] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 892, GPU 1266 (MiB)
[11/02/2023-04:01:53] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +64, now: CPU 0, GPU 87 (MiB)
[11/02/2023-04:01:53] [I] Setting persistentCacheLimit to 0 bytes.
[11/02/2023-04:01:53] [I] Using random values for input input
[11/02/2023-04:01:53] [I] Created input binding for input with dimensions 1x3x640x640
[11/02/2023-04:01:53] [I] Using random values for output boxes
[11/02/2023-04:01:53] [I] Created output binding for boxes with dimensions 1x100x4
[11/02/2023-04:01:53] [I] Using random values for output scores
[11/02/2023-04:01:53] [I] Created output binding for scores with dimensions 1x100x1
[11/02/2023-04:01:53] [I] Using random values for output classes
[11/02/2023-04:01:53] [I] Created output binding for classes with dimensions 1x100x1
[11/02/2023-04:01:53] [I] Using random values for output masks
[11/02/2023-04:01:53] [I] Created output binding for masks with dimensions 1x100x160x160
[11/02/2023-04:01:53] [I] Starting inference
[11/02/2023-04:01:56] [I] Warmup completed 115 queries over 200 ms
[11/02/2023-04:01:56] [I] Timing trace has 1688 queries over 3.00264 s
[11/02/2023-04:01:56] [I]
[11/02/2023-04:01:56] [I] === Trace details ===
[11/02/2023-04:01:56] [I] Trace averages of 10 runs:
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.56747 ms - Host latency: 2.21314 ms (enqueue 1.7873 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.54307 ms - Host latency: 2.18427 ms (enqueue 1.73931 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.57757 ms - Host latency: 2.22854 ms (enqueue 1.81655 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.56515 ms - Host latency: 2.21062 ms (enqueue 1.78457 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.56338 ms - Host latency: 2.20528 ms (enqueue 1.7626 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.51584 ms - Host latency: 2.16234 ms (enqueue 1.75282 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.54091 ms - Host latency: 2.18674 ms (enqueue 1.75943 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.54344 ms - Host latency: 2.18922 ms (enqueue 1.76217 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.54409 ms - Host latency: 2.1895 ms (enqueue 1.76221 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.54753 ms - Host latency: 2.19339 ms (enqueue 1.76555 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.50078 ms - Host latency: 2.14658 ms (enqueue 1.72012 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.50508 ms - Host latency: 2.15064 ms (enqueue 1.72338 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.50521 ms - Host latency: 2.15062 ms (enqueue 1.72423 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.49931 ms - Host latency: 2.14468 ms (enqueue 1.71692 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.5013 ms - Host latency: 2.14669 ms (enqueue 1.71977 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.50251 ms - Host latency: 2.14856 ms (enqueue 1.72085 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.50555 ms - Host latency: 2.15304 ms (enqueue 1.72458 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.5004 ms - Host latency: 2.14631 ms (enqueue 1.7192 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.50093 ms - Host latency: 2.14621 ms (enqueue 1.71902 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.49962 ms - Host latency: 2.14537 ms (enqueue 1.71729 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.50071 ms - Host latency: 2.14625 ms (enqueue 1.71885 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.50026 ms - Host latency: 2.14611 ms (enqueue 1.71807 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.50107 ms - Host latency: 2.14702 ms (enqueue 1.71903 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.49901 ms - Host latency: 2.14471 ms (enqueue 1.71766 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.53431 ms - Host latency: 2.18085 ms (enqueue 1.7536 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.57101 ms - Host latency: 2.2181 ms (enqueue 1.80222 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.55786 ms - Host latency: 2.19951 ms (enqueue 1.75612 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.54619 ms - Host latency: 2.19189 ms (enqueue 1.7646 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.54412 ms - Host latency: 2.18995 ms (enqueue 1.76409 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52368 ms - Host latency: 2.16932 ms (enqueue 1.75057 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.51898 ms - Host latency: 2.1686 ms (enqueue 1.75078 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.51598 ms - Host latency: 2.16191 ms (enqueue 1.73395 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.49896 ms - Host latency: 2.14463 ms (enqueue 1.71724 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.49713 ms - Host latency: 2.14265 ms (enqueue 1.71495 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.50034 ms - Host latency: 2.14632 ms (enqueue 1.71839 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.50103 ms - Host latency: 2.14669 ms (enqueue 1.72012 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.4991 ms - Host latency: 2.14467 ms (enqueue 1.7178 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.49777 ms - Host latency: 2.14384 ms (enqueue 1.7158 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.50784 ms - Host latency: 2.15565 ms (enqueue 1.72824 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.50315 ms - Host latency: 2.14845 ms (enqueue 1.72173 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.50368 ms - Host latency: 2.14942 ms (enqueue 1.72358 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.50013 ms - Host latency: 2.14577 ms (enqueue 1.71794 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.53855 ms - Host latency: 2.18411 ms (enqueue 1.75699 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52056 ms - Host latency: 2.17162 ms (enqueue 1.74729 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.53444 ms - Host latency: 2.17687 ms (enqueue 1.73158 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.54251 ms - Host latency: 2.18448 ms (enqueue 1.74099 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.62142 ms - Host latency: 2.26287 ms (enqueue 1.81839 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.53982 ms - Host latency: 2.18553 ms (enqueue 1.75852 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.53818 ms - Host latency: 2.18375 ms (enqueue 1.75752 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.54418 ms - Host latency: 2.1855 ms (enqueue 1.7389 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.5459 ms - Host latency: 2.18713 ms (enqueue 1.74152 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.54318 ms - Host latency: 2.18458 ms (enqueue 1.73838 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.54521 ms - Host latency: 2.18658 ms (enqueue 1.74188 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.54857 ms - Host latency: 2.19086 ms (enqueue 1.74408 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.54332 ms - Host latency: 2.18469 ms (enqueue 1.73932 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.53981 ms - Host latency: 2.18143 ms (enqueue 1.73535 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.56869 ms - Host latency: 2.2075 ms (enqueue 1.74045 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.49869 ms - Host latency: 2.14288 ms (enqueue 1.71851 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.50074 ms - Host latency: 2.14641 ms (enqueue 1.71957 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.50082 ms - Host latency: 2.14651 ms (enqueue 1.7193 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.50818 ms - Host latency: 2.15349 ms (enqueue 1.72734 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.50294 ms - Host latency: 2.14851 ms (enqueue 1.72297 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.5004 ms - Host latency: 2.14589 ms (enqueue 1.71921 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.49967 ms - Host latency: 2.14498 ms (enqueue 1.71813 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.50178 ms - Host latency: 2.14749 ms (enqueue 1.72041 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.50077 ms - Host latency: 2.14644 ms (enqueue 1.71936 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.50031 ms - Host latency: 2.14594 ms (enqueue 1.71918 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.49796 ms - Host latency: 2.14402 ms (enqueue 1.71858 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.49933 ms - Host latency: 2.14478 ms (enqueue 1.71843 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.49774 ms - Host latency: 2.14362 ms (enqueue 1.71624 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.5022 ms - Host latency: 2.15302 ms (enqueue 1.72377 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.50157 ms - Host latency: 2.14687 ms (enqueue 1.71948 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.50197 ms - Host latency: 2.1476 ms (enqueue 1.72059 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.50054 ms - Host latency: 2.14608 ms (enqueue 1.71959 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.50226 ms - Host latency: 2.14783 ms (enqueue 1.71997 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.50045 ms - Host latency: 2.14652 ms (enqueue 1.71779 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.50327 ms - Host latency: 2.15275 ms (enqueue 1.72323 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.50594 ms - Host latency: 2.15726 ms (enqueue 1.72666 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.49948 ms - Host latency: 2.14498 ms (enqueue 1.71864 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.49852 ms - Host latency: 2.1442 ms (enqueue 1.71769 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.50143 ms - Host latency: 2.1468 ms (enqueue 1.72061 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.57386 ms - Host latency: 2.21927 ms (enqueue 1.79906 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.59067 ms - Host latency: 2.23188 ms (enqueue 1.78713 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.58927 ms - Host latency: 2.23066 ms (enqueue 1.79908 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.63063 ms - Host latency: 2.26729 ms (enqueue 1.81172 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.61261 ms - Host latency: 2.25405 ms (enqueue 1.80938 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.60544 ms - Host latency: 2.24674 ms (enqueue 1.80078 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.61169 ms - Host latency: 2.25294 ms (enqueue 1.80745 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.63901 ms - Host latency: 2.28134 ms (enqueue 1.84021 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.62869 ms - Host latency: 2.26555 ms (enqueue 1.80115 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.60476 ms - Host latency: 2.24662 ms (enqueue 1.80186 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.62632 ms - Host latency: 2.26927 ms (enqueue 1.8324 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.61266 ms - Host latency: 2.27126 ms (enqueue 1.8548 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.58112 ms - Host latency: 2.22858 ms (enqueue 1.82648 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.58099 ms - Host latency: 2.2282 ms (enqueue 1.82659 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.61832 ms - Host latency: 2.25531 ms (enqueue 1.79891 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.60356 ms - Host latency: 2.24481 ms (enqueue 1.79991 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.60837 ms - Host latency: 2.24979 ms (enqueue 1.80468 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.6455 ms - Host latency: 2.28683 ms (enqueue 1.84213 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.60743 ms - Host latency: 2.24858 ms (enqueue 1.80402 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.61591 ms - Host latency: 2.26296 ms (enqueue 1.83188 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.96003 ms - Host latency: 2.59756 ms (enqueue 2.13203 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.67695 ms - Host latency: 2.31396 ms (enqueue 1.85105 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.62135 ms - Host latency: 2.26328 ms (enqueue 1.8183 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.63699 ms - Host latency: 2.27855 ms (enqueue 1.83374 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.56436 ms - Host latency: 2.20554 ms (enqueue 1.76013 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.83174 ms - Host latency: 2.47195 ms (enqueue 1.96165 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.70525 ms - Host latency: 2.3459 ms (enqueue 1.92439 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.69146 ms - Host latency: 2.33259 ms (enqueue 1.88896 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.70952 ms - Host latency: 2.35081 ms (enqueue 1.91487 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.61265 ms - Host latency: 2.25796 ms (enqueue 1.84934 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.61057 ms - Host latency: 2.25244 ms (enqueue 1.83516 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.61426 ms - Host latency: 2.25601 ms (enqueue 1.82349 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.64824 ms - Host latency: 2.28066 ms (enqueue 1.81912 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.59995 ms - Host latency: 2.24666 ms (enqueue 1.82932 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.60073 ms - Host latency: 2.24727 ms (enqueue 1.83508 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.62224 ms - Host latency: 2.26389 ms (enqueue 1.81895 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.72319 ms - Host latency: 2.36682 ms (enqueue 1.94055 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.65349 ms - Host latency: 2.29294 ms (enqueue 1.8688 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.64724 ms - Host latency: 2.28337 ms (enqueue 1.82039 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.6344 ms - Host latency: 2.27102 ms (enqueue 1.81765 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.65 ms - Host latency: 2.28628 ms (enqueue 1.82395 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.66318 ms - Host latency: 2.29998 ms (enqueue 1.83718 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.63496 ms - Host latency: 2.27722 ms (enqueue 1.84873 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.577 ms - Host latency: 2.21936 ms (enqueue 1.77939 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.54934 ms - Host latency: 2.19055 ms (enqueue 1.74553 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52664 ms - Host latency: 2.17288 ms (enqueue 1.74536 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52461 ms - Host latency: 2.17046 ms (enqueue 1.74426 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52202 ms - Host latency: 2.16816 ms (enqueue 1.74009 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52278 ms - Host latency: 2.16921 ms (enqueue 1.74148 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52222 ms - Host latency: 2.16882 ms (enqueue 1.74148 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52485 ms - Host latency: 2.17097 ms (enqueue 1.7408 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52312 ms - Host latency: 2.16978 ms (enqueue 1.74146 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.55859 ms - Host latency: 2.2054 ms (enqueue 1.77649 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.54426 ms - Host latency: 2.18984 ms (enqueue 1.76438 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.51702 ms - Host latency: 2.16289 ms (enqueue 1.73545 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52166 ms - Host latency: 2.16836 ms (enqueue 1.7407 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52517 ms - Host latency: 2.17185 ms (enqueue 1.7449 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52122 ms - Host latency: 2.16716 ms (enqueue 1.73938 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52109 ms - Host latency: 2.17024 ms (enqueue 1.74187 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52341 ms - Host latency: 2.16965 ms (enqueue 1.74268 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52168 ms - Host latency: 2.16826 ms (enqueue 1.74124 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.5178 ms - Host latency: 2.16394 ms (enqueue 1.73645 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52031 ms - Host latency: 2.1676 ms (enqueue 1.73975 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52109 ms - Host latency: 2.16772 ms (enqueue 1.73977 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52126 ms - Host latency: 2.17004 ms (enqueue 1.73948 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52305 ms - Host latency: 2.16943 ms (enqueue 1.74248 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52263 ms - Host latency: 2.16824 ms (enqueue 1.74216 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.5217 ms - Host latency: 2.16743 ms (enqueue 1.7415 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52222 ms - Host latency: 2.16785 ms (enqueue 1.74114 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52517 ms - Host latency: 2.1709 ms (enqueue 1.74446 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52285 ms - Host latency: 2.16843 ms (enqueue 1.74292 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52021 ms - Host latency: 2.16594 ms (enqueue 1.73857 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52188 ms - Host latency: 2.16755 ms (enqueue 1.74229 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52378 ms - Host latency: 2.16938 ms (enqueue 1.74348 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52236 ms - Host latency: 2.17317 ms (enqueue 1.7488 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.54307 ms - Host latency: 2.19482 ms (enqueue 1.76543 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52966 ms - Host latency: 2.17607 ms (enqueue 1.74829 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52344 ms - Host latency: 2.17378 ms (enqueue 1.74241 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52468 ms - Host latency: 2.17317 ms (enqueue 1.74683 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52092 ms - Host latency: 2.16741 ms (enqueue 1.73943 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52339 ms - Host latency: 2.17039 ms (enqueue 1.74177 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52073 ms - Host latency: 2.16819 ms (enqueue 1.74124 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52385 ms - Host latency: 2.1698 ms (enqueue 1.74268 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.53267 ms - Host latency: 2.18018 ms (enqueue 1.75249 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52524 ms - Host latency: 2.1709 ms (enqueue 1.74404 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52498 ms - Host latency: 2.17024 ms (enqueue 1.7446 ms)
[11/02/2023-04:01:56] [I] Average on 10 runs - GPU latency: 1.52441 ms - Host latency: 2.16997 ms (enqueue 1.74385 ms)
[11/02/2023-04:01:56] [I]
[11/02/2023-04:01:56] [I] === Performance summary ===
[11/02/2023-04:01:56] [I] Throughput: 562.172 qps
[11/02/2023-04:01:56] [I] Latency: min = 2.1283 ms, max = 5.60583 ms, mean = 2.19671 ms, median = 2.159 ms, percentile(90%) = 2.28162 ms, percentile(95%) = 2.52026 ms, percentile(99%) = 2.75 ms
[11/02/2023-04:01:56] [I] Enqueue Time: min = 1.69995 ms, max = 4.97644 ms, mean = 1.76463 ms, median = 1.73242 ms, percentile(90%) = 1.87451 ms, percentile(95%) = 2.00977 ms, percentile(99%) = 2.26904 ms
[11/02/2023-04:01:56] [I] H2D Latency: min = 0.199707 ms, max = 0.266235 ms, mean = 0.217886 ms, median = 0.21814 ms, percentile(90%) = 0.218384 ms, percentile(95%) = 0.219727 ms, percentile(99%) = 0.223633 ms
[11/02/2023-04:01:56] [I] GPU Compute Time: min = 1.48901 ms, max = 4.98071 ms, mean = 1.55201 ms, median = 1.5135 ms, percentile(90%) = 1.63342 ms, percentile(95%) = 1.87793 ms, percentile(99%) = 2.13379 ms
[11/02/2023-04:01:56] [I] D2H Latency: min = 0.396973 ms, max = 0.484314 ms, mean = 0.426806 ms, median = 0.42749 ms, percentile(90%) = 0.428467 ms, percentile(95%) = 0.428955 ms, percentile(99%) = 0.438232 ms
[11/02/2023-04:01:56] [I] Total Host Walltime: 3.00264 s
[11/02/2023-04:01:56] [I] Total GPU Compute Time: 2.6198 s
[11/02/2023-04:01:56] [W] * Throughput may be bound by Enqueue Time rather than GPU Compute and the GPU may be under-utilized.
[11/02/2023-04:01:56] [W] If not already in use, --useCudaGraph (utilize CUDA graphs where possible) may increase the throughput.
[11/02/2023-04:01:56] [W] * GPU compute time is unstable, with coefficient of variance = 10.0717%.
[11/02/2023-04:01:56] [W] If not already in use, locking GPU clock frequency or adding --useSpinWait may improve the stability.
[11/02/2023-04:01:56] [I] Explanations of the performance metrics are printed in the verbose logs.
[11/02/2023-04:01:56] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v8502] # /usr/src/tensorrt/bin/trtexec --fp16 --onnx=yolov8s-seg.onnx --saveEngine=ds62.engine --minShapes=input:1x3x640x640 --optShapes=input:1x3x640x640 --maxShapes=input:1x3x640x640 --shapes=input:1x3x640x640 --workspace=10000