Onnx Model FAILED_ALLOCATION: basic_string::_S_construct null not valid

• Hardware Platform (Jetson / GPU) Nvidia T4 GPU
• DeepStream Version 5.0
• TensorRT Version 7.0
• NVIDIA GPU Driver Version (valid for GPU only) 460.56
• Issue Type( questions, new requirements, bugs) bug

I was trying to use an Onnx model as a secondary detector model in the pipeline. We want to extract the output-tensor which has size 2048. When the pipeline tries to create the engine file, I was faced with the following errors.

INFO: ../nvdsinfer/nvdsinfer_func_utils.cpp:37 [TRT]: Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
INFO: ../nvdsinfer/nvdsinfer_func_utils.cpp:37 [TRT]: Detected 1 inputs and 1 output network tensors.
ERROR: ../nvdsinfer/nvdsinfer_func_utils.cpp:31 [TRT]: FAILED_ALLOCATION: basic_string::_S_construct null not valid
ERROR: ../nvdsinfer/nvdsinfer_model_builder.cpp:1387 Serialize engine failed to file: /home/VpsSamples/VPService/Plugins/gstdynamic/models/Person_Reidentification/basel
ine_R101.onnx_b1_gpu0_fp16.engine
INFO: ../nvdsinfer/nvdsinfer_model_builder.cpp:685 [FullDims Engine Info]: layers num: 2
0   INPUT  kFLOAT batched_inputs.1 3x384x128       min: 1x3x384x128     opt: 1x3x384x128     Max: 1x3x384x128
1   OUTPUT kFLOAT 1896            2048            min: 0               opt: 0               Max: 0

ERROR: ../nvdsinfer/nvdsinfer_func_utils.cpp:31 [TRT]: ../rtSafe/cuda/cudaActivationRunner.cpp (96) - Cudnn Error in execute: 8 (CUDNN_STATUS_EXECUTION_FAILED)
ERROR: ../nvdsinfer/nvdsinfer_func_utils.cpp:31 [TRT]: FAILED_EXECUTION: std::exception
ERROR: nvdsinfer_backend.cpp:459 Failed to enqueue trt inference batch
ERROR: nvdsinfer_context_impl.cpp:1408 Infer context enqueue buffer failed, nvinfer error:NVDSINFER_TENSORRT_ERROR
ERROR: from element /GstPipeline:pipeline0/GstVpsDeepStream:vpsdeepstream0/GstNvInfer:person_reidentification: Failed to queue input batch for inferencing
Additional debug info:
gstnvinfer.cpp(1188): gst_nvinfer_input_queue_loop (): /GstPipeline:pipeline0/GstVpsDeepStream:vpsdeepstream0/GstNvInfer:person_reidentification
Execution ended after 0:00:01.901812224
Setting pipeline to PAUSED ...
Setting pipeline to READY ...
ERROR: ../nvdsinfer/nvdsinfer_func_utils.cpp:31 [TRT]: ../rtSafe/cuda/cudaActivationRunner.cpp (96) - Cudnn Error in execute: 8 (CUDNN_STATUS_EXECUTION_FAILED)
ERROR: ../nvdsinfer/nvdsinfer_func_utils.cpp:31 [TRT]: FAILED_EXECUTION: std::exception
ERROR: nvdsinfer_backend.cpp:459 Failed to enqueue trt inference batch
ERROR: nvdsinfer_context_impl.cpp:1408 Infer context enqueue buffer failed, nvinfer error:NVDSINFER_TENSORRT_ERROR
GPUassert: an illegal memory access was encountered src/modules/cuDCF/cudaCropScaleInTexture2D.cu 1254
ERROR: ../nvdsinfer/nvdsinfer_func_utils.cpp:31 [TRT]: ../rtSafe/cuda/cudaScaleRunner.cpp (114) - Cuda Error in execute: 700 (an illegal memory access was encountered)
ERROR: nvdsinfer_context_impl.cpp:1448 Failed to synchronize on cuda copy-coplete-event, cuda err_no:700, err_str:cudaErrorIllegalAddress
ERROR: ../nvdsinfer/nvdsinfer_func_utils.cpp:31 [TRT]: FAILED_EXECUTION: std::exception
ERROR: nvdsinfer_backend.cpp:459 Failed to enqueue trt inference batch
ERROR: nvdsinfer_context_impl.cpp:1408 Infer context enqueue buffer failed, nvinfer error:NVDSINFER_TENSORRT_ERROR
[ERROR] 2021-07-25 14:16:10 CUDA error 700 (cudaErrorIllegalAddress): an illegal memory access was encountered
ERROR: nvdsinfer_context_impl.cpp:333 Failed to make stream wait on event, cuda err_no:700, err_str:cudaErrorIllegalAddress
ERROR: nvdsinfer_context_impl.cpp:1384 Preprocessor transform input data failed., nvinfer error:NVDSINFER_CUDA_ERROR
Cuda failure: status=700 in CreateTextureObj at line 2496
[ERROR] 2021-07-25 14:16:10 Error destroying cuda device: VPI_STATUS_INTERNAL
nvbufsurftransform.cpp(2369) : getLastCudaError() CUDA error : Recevied NvBufSurfTransformError_Execution_Error : (400) invalid resource handle.
(../inc/nvcudautils) Error ResourceError: e: all CUDA-capable devices are busy or unavailable (cudaErrorDevicesUnavailable) (propagating from /home/rlima/src/vpi/ext/nv
cudautils/src/AllocMem.cpp, function freeMem(), line 283)
ERROR: nvdsinfer_context_impl.cpp:333 Failed to make stream wait on event, cuda err_no:46, err_str:cudaErrorDevicesUnavailable
(../inc/nvcudautils) Error ResourceError:  (propagating from /home/rlima/src/vpi/ext/nvcudautils/inc/nvcudautils/detail/../AllocMem.h, function operator()(), line 45)
ERROR: nvdsinfer_context_impl.cpp:1384 Preprocessor transform input data failed., nvinfer error:NVDSINFER_CUDA_ERROR
(../inc/nvcudautils) Error ResourceError: e: all CUDA-capable devices are busy or unavailable (cudaErrorDevicesUnavailable) (propagating from /home/rlima/src/vpi/ext/nv
cudautils/src/AllocMem.cpp, function freeMem(), line 283)
(../inc/nvcudautils) Error ResourceError:  (propagating from /home/rlima/src/vpi/ext/nvcudautils/inc/nvcudautils/detail/../AllocMem.h, function operator()(), line 60)
(../inc/nvcudautils) Error ResourceError: e: all CUDA-capable devices are busy or unavailable (cudaErrorDevicesUnavailable) (propagating from /home/rlima/src/vpi/ext/nv
cudautils/src/AllocMem.cpp, function freeMem(), line 283)
(../inc/nvcudautils) Error ResourceError:  (propagating from /home/rlima/src/vpi/ext/nvcudautils/inc/nvcudautils/detail/../AllocMem.h, function operator()(), line 60)
Caught SIGSEGV
ERROR: nvdsinfer_context_impl.cpp:333 Failed to make stream wait on event, cuda err_no:46, err_str:cudaErrorDevicesUnavailable
ERROR: nvdsinfer_context_impl.cpp:1384 Preprocessor transform input data failed., nvinfer error:NVDSINFER_CUDA_ERROR
exec gdb failed: No such file or directory
./fast_running.sh: line 13:   994 Segmentation fault      (core dumped) gst-launch-1.0 vpsdeepstream

Please find attached a sample of the configuration of the onnx model:

[property]
gpu-id=0
#net-scale-factor=0.0174292
#offsets=123.675;116.28;103.53
onnx-file=../models/Person_Reidentification/baseline_R101.onnx
#labelfile-path=labels.txt
batch-size=1
process-mode=2
# work on first primary - people class
operate-on-gie-id=11
operate-on-class-ids=0
model-color-format=0
## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=2
num-detected-classes=1
interval=0
gie-unique-id=17
model-engine-file=../models/Person_Reidentification/baseline_R101_FP16.engine
#baseline_R101.onnx_b1_gpu0_fp16.engine
network-type=100
output-tensor-meta=1
workspace-size=2000

Can you please assist in this?

1 Like

Hi,
ERROR: …/nvdsinfer/nvdsinfer_func_utils.cpp:31 [TRT]: …/rtSafe/cuda/cudaActivationRunner.cpp (96) - Cudnn Error in execute: 8 (CUDNN_STATUS_EXECUTION_FAILED)

Which CUDNN version you are using? and did you run directly on host or within docker?

I am running the code within the docker container.

I guess the cudnn version is 7.6.5

From this error, could you try increasing the “workspace-size” to 1024 or more ?

https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_plugin_gst-nvinfer.html

I tried to set it to different values (1024, 2048, 4096, 10000). same problem

Ok, could you use trtexec to run your model?
Or is it possible to share the model?

We used trtexec to run the model yet we get the same error.

Please find attached the link of the onnx model. baseline_R101.onnx - Google Drive

your feedback is appreciated.

@mchi @amycao hello, any updates about this issue?? Best.

Hi @mohammad1 ,
Sorrt for delay!

I can see some waring like below if using trtexec to run your onnx model.

[08/20/2021-15:09:06] [E] [TRT] ../rtSafe/cuda/cudaScaleRunner.cpp (114) - Cuda Error in execute: 2 (out of memory)
[08/20/2021-15:09:06] [E] [TRT] FAILED_EXECUTION: std::exception
[08/20/2021-15:09:06] [W] [TRT] Explicit batch network detected and batch size specified, use enqueue without batch size instead.
[08/20/2021-15:09:06] [E] [TRT] ../rtSafe/cuda/cudaScaleRunner.cpp (114) - Cuda Error in execute: 2 (out of memory)
[08/20/2021-15:09:06] [E] [TRT] FAILED_EXECUTION: std::exception
[08/20/2021-15:09:06] [W] [TRT] Explicit batch network detected and batch size specified, use enqueue without batch size instead.
[08/20/2021-15:09:06] [E] [TRT] ../rtSafe/cuda/cudaScaleRunner.cpp (114) - Cuda Error in execute: 2 (out of memory)
[08/20/2021-15:09:06] [E] [TRT] FAILED_EXECUTION: std::exception
[08/20/2021-15:09:06] [W] [TRT] Explicit batch network detected and batch size specified, use enqueue without batch size instead.
[08/20/2021-15:09:06] [E] [TRT] ../rtSafe/cuda/cudaScaleRunner.cpp (114) - Cuda Error in execute: 2 (out of memory)
[08/20/2021-15:09:06] [E] [TRT] FAILED_EXECUTION: std::exception

And, I tried TensorRT-7.2.x , there is not any error and warning as below.
Since DeepStream 5.1 is using TensorRT 7.2.x, could you upgrade your DeepStream to DS 5.1 ?

root@eda66a52647c:~/TensorRT-7.2.2.3/bin# ./trtexec --onnx=baseline_R101.onnx --workspace=1024 --fp16
&&&& RUNNING TensorRT.trtexec # ./trtexec --onnx=baseline_R101.onnx --workspace=1024 --fp16
[08/20/2021-15:19:12] [I] === Model Options ===
[08/20/2021-15:19:12] [I] Format: ONNX
[08/20/2021-15:19:12] [I] Model: baseline_R101.onnx
[08/20/2021-15:19:12] [I] Output:
[08/20/2021-15:19:12] [I] === Build Options ===
[08/20/2021-15:19:12] [I] Max batch: explicit
[08/20/2021-15:19:12] [I] Workspace: 1024 MiB
[08/20/2021-15:19:12] [I] minTiming: 1
[08/20/2021-15:19:12] [I] avgTiming: 8
[08/20/2021-15:19:12] [I] Precision: FP32+FP16
[08/20/2021-15:19:12] [I] Calibration:
[08/20/2021-15:19:12] [I] Refit: Disabled
[08/20/2021-15:19:12] [I] Safe mode: Disabled
[08/20/2021-15:19:12] [I] Save engine:
[08/20/2021-15:19:12] [I] Load engine:
[08/20/2021-15:19:12] [I] Builder Cache: Enabled
[08/20/2021-15:19:12] [I] NVTX verbosity: 0
[08/20/2021-15:19:12] [I] Tactic sources: Using default tactic sources
[08/20/2021-15:19:12] [I] Input(s)s format: fp32:CHW
[08/20/2021-15:19:12] [I] Output(s)s format: fp32:CHW
[08/20/2021-15:19:12] [I] Input build shapes: model
[08/20/2021-15:19:12] [I] Input calibration shapes: model
[08/20/2021-15:19:12] [I] === System Options ===
[08/20/2021-15:19:12] [I] Device: 0
[08/20/2021-15:19:12] [I] DLACore:
[08/20/2021-15:19:12] [I] Plugins:
[08/20/2021-15:19:12] [I] === Inference Options ===
[08/20/2021-15:19:12] [I] Batch: Explicit
[08/20/2021-15:19:12] [I] Input inference shapes: model
[08/20/2021-15:19:12] [I] Iterations: 10
[08/20/2021-15:19:12] [I] Duration: 3s (+ 200ms warm up)
[08/20/2021-15:19:12] [I] Sleep time: 0ms
[08/20/2021-15:19:12] [I] Streams: 1
[08/20/2021-15:19:12] [I] ExposeDMA: Disabled
[08/20/2021-15:19:12] [I] Data transfers: Enabled
[08/20/2021-15:19:12] [I] Spin-wait: Disabled
[08/20/2021-15:19:12] [I] Multithreading: Disabled
[08/20/2021-15:19:12] [I] CUDA Graph: Disabled
[08/20/2021-15:19:12] [I] Separate profiling: Disabled
[08/20/2021-15:19:12] [I] Skip inference: Disabled
[08/20/2021-15:19:12] [I] Inputs:
[08/20/2021-15:19:12] [I] === Reporting Options ===
[08/20/2021-15:19:12] [I] Verbose: Disabled
[08/20/2021-15:19:12] [I] Averages: 10 inferences
[08/20/2021-15:19:12] [I] Percentile: 99
[08/20/2021-15:19:12] [I] Dump refittable layers:Disabled
[08/20/2021-15:19:12] [I] Dump output: Disabled
[08/20/2021-15:19:12] [I] Profile: Disabled
[08/20/2021-15:19:12] [I] Export timing to JSON file:
[08/20/2021-15:19:12] [I] Export output to JSON file:
[08/20/2021-15:19:12] [I] Export profile to JSON file:
[08/20/2021-15:19:12] [I]
[08/20/2021-15:19:12] [I] === Device Information ===
[08/20/2021-15:19:12] [I] Selected Device: Tesla T4
[08/20/2021-15:19:12] [I] Compute Capability: 7.5
[08/20/2021-15:19:12] [I] SMs: 40
[08/20/2021-15:19:12] [I] Compute Clock Rate: 1.59 GHz
[08/20/2021-15:19:12] [I] Device Global Memory: 15109 MiB
[08/20/2021-15:19:12] [I] Shared Memory per SM: 64 KiB
[08/20/2021-15:19:12] [I] Memory Bus Width: 256 bits (ECC enabled)
[08/20/2021-15:19:12] [I] Memory Clock Rate: 5.001 GHz
[08/20/2021-15:19:12] [I]
----------------------------------------------------------------
Input filename:   baseline_R101.onnx
ONNX IR version:  0.0.6
Opset version:    9
Producer name:    pytorch
Producer version: 1.8
Domain:
Model version:    0
Doc string:
----------------------------------------------------------------
[08/20/2021-15:19:36] [W] [TRT] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[08/20/2021-15:20:45] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.


[08/20/2021-15:24:16] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[08/20/2021-15:24:16] [I] Engine built in 304.315 sec.
[08/20/2021-15:24:16] [I] Starting inference
[08/20/2021-15:24:20] [I] Warmup completed 0 queries over 200 ms
[08/20/2021-15:24:20] [I] Timing trace has 0 queries over 3.00937 s
[08/20/2021-15:24:20] [I] Trace averages of 10 runs:
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 4.00135 ms - Host latency: 4.06035 ms (end to end 4.41789 ms, enqueue 3.97348 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.99778 ms - Host latency: 4.05611 ms (end to end 4.4118 ms, enqueue 3.96982 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.51036 ms - Host latency: 3.567 ms (end to end 3.84879 ms, enqueue 3.48986 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.46995 ms - Host latency: 3.52767 ms (end to end 3.79908 ms, enqueue 3.4435 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.46776 ms - Host latency: 3.5238 ms (end to end 3.79459 ms, enqueue 3.43646 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44617 ms - Host latency: 3.50457 ms (end to end 3.77132 ms, enqueue 3.41883 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.45239 ms - Host latency: 3.50988 ms (end to end 3.77225 ms, enqueue 3.424 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.46645 ms - Host latency: 3.52231 ms (end to end 3.78731 ms, enqueue 3.43697 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.45866 ms - Host latency: 3.51561 ms (end to end 3.7812 ms, enqueue 3.42833 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.45136 ms - Host latency: 3.50894 ms (end to end 3.77388 ms, enqueue 3.42327 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44476 ms - Host latency: 3.5025 ms (end to end 3.76958 ms, enqueue 3.41596 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44251 ms - Host latency: 3.49929 ms (end to end 3.76674 ms, enqueue 3.41505 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44622 ms - Host latency: 3.50388 ms (end to end 3.76889 ms, enqueue 3.41829 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.46769 ms - Host latency: 3.52604 ms (end to end 3.79064 ms, enqueue 3.44478 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.46718 ms - Host latency: 3.52679 ms (end to end 3.78146 ms, enqueue 3.43151 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44435 ms - Host latency: 3.50246 ms (end to end 3.76851 ms, enqueue 3.41589 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44308 ms - Host latency: 3.50086 ms (end to end 3.76548 ms, enqueue 3.41355 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44459 ms - Host latency: 3.50162 ms (end to end 3.76924 ms, enqueue 3.41638 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.45134 ms - Host latency: 3.5095 ms (end to end 3.77651 ms, enqueue 3.42298 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44346 ms - Host latency: 3.50156 ms (end to end 3.76579 ms, enqueue 3.41467 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44664 ms - Host latency: 3.50663 ms (end to end 3.77005 ms, enqueue 3.41871 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.45175 ms - Host latency: 3.50883 ms (end to end 3.77444 ms, enqueue 3.42216 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44796 ms - Host latency: 3.50464 ms (end to end 3.77305 ms, enqueue 3.41919 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.4465 ms - Host latency: 3.50477 ms (end to end 3.77224 ms, enqueue 3.41922 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44836 ms - Host latency: 3.50614 ms (end to end 3.77371 ms, enqueue 3.42007 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44287 ms - Host latency: 3.49956 ms (end to end 3.76313 ms, enqueue 3.4147 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44493 ms - Host latency: 3.50304 ms (end to end 3.76903 ms, enqueue 3.41642 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.4524 ms - Host latency: 3.51078 ms (end to end 3.77753 ms, enqueue 3.4244 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44131 ms - Host latency: 3.50018 ms (end to end 3.76464 ms, enqueue 3.41384 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44385 ms - Host latency: 3.5017 ms (end to end 3.76737 ms, enqueue 3.41583 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.45083 ms - Host latency: 3.51042 ms (end to end 3.77571 ms, enqueue 3.42408 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44357 ms - Host latency: 3.50276 ms (end to end 3.76998 ms, enqueue 3.41523 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44934 ms - Host latency: 3.5085 ms (end to end 3.77589 ms, enqueue 3.4217 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.4395 ms - Host latency: 3.4983 ms (end to end 3.76572 ms, enqueue 3.41261 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.42413 ms - Host latency: 3.48284 ms (end to end 3.75022 ms, enqueue 3.3963 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.42428 ms - Host latency: 3.48291 ms (end to end 3.74928 ms, enqueue 3.39672 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.4229 ms - Host latency: 3.48159 ms (end to end 3.74872 ms, enqueue 3.39501 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.43341 ms - Host latency: 3.49188 ms (end to end 3.75796 ms, enqueue 3.40552 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.42021 ms - Host latency: 3.47891 ms (end to end 3.74631 ms, enqueue 3.39274 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.42374 ms - Host latency: 3.48311 ms (end to end 3.74755 ms, enqueue 3.39591 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.4296 ms - Host latency: 3.48815 ms (end to end 3.75398 ms, enqueue 3.40222 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.42448 ms - Host latency: 3.4825 ms (end to end 3.7486 ms, enqueue 3.39642 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.43486 ms - Host latency: 3.49396 ms (end to end 3.76174 ms, enqueue 3.4079 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44043 ms - Host latency: 3.49948 ms (end to end 3.76447 ms, enqueue 3.41233 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44691 ms - Host latency: 3.50566 ms (end to end 3.77443 ms, enqueue 3.41991 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44276 ms - Host latency: 3.50132 ms (end to end 3.77156 ms, enqueue 3.4155 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44285 ms - Host latency: 3.5016 ms (end to end 3.76709 ms, enqueue 3.41322 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44987 ms - Host latency: 3.50875 ms (end to end 3.77667 ms, enqueue 3.42384 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44092 ms - Host latency: 3.50046 ms (end to end 3.76659 ms, enqueue 3.41187 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44626 ms - Host latency: 3.50549 ms (end to end 3.77377 ms, enqueue 3.41906 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.4423 ms - Host latency: 3.50115 ms (end to end 3.7652 ms, enqueue 3.41444 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44398 ms - Host latency: 3.50303 ms (end to end 3.77057 ms, enqueue 3.41649 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.79803 ms - Host latency: 3.88043 ms (end to end 4.11862 ms, enqueue 3.76943 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.45247 ms - Host latency: 3.54248 ms (end to end 3.7806 ms, enqueue 3.42371 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.45388 ms - Host latency: 3.54192 ms (end to end 3.78384 ms, enqueue 3.42512 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44255 ms - Host latency: 3.53115 ms (end to end 3.77153 ms, enqueue 3.41582 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44795 ms - Host latency: 3.53709 ms (end to end 3.77661 ms, enqueue 3.41885 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44797 ms - Host latency: 3.53667 ms (end to end 3.77817 ms, enqueue 3.42085 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.4449 ms - Host latency: 3.53418 ms (end to end 3.77153 ms, enqueue 3.41711 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.4519 ms - Host latency: 3.53999 ms (end to end 3.78027 ms, enqueue 3.42419 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44717 ms - Host latency: 3.53633 ms (end to end 3.77695 ms, enqueue 3.4209 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44956 ms - Host latency: 3.53936 ms (end to end 3.7759 ms, enqueue 3.42041 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44614 ms - Host latency: 3.53594 ms (end to end 3.7717 ms, enqueue 3.41836 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44526 ms - Host latency: 3.53464 ms (end to end 3.77429 ms, enqueue 3.41794 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.45327 ms - Host latency: 3.54202 ms (end to end 3.7843 ms, enqueue 3.42607 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44871 ms - Host latency: 3.53853 ms (end to end 3.77405 ms, enqueue 3.4196 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44419 ms - Host latency: 3.53369 ms (end to end 3.7717 ms, enqueue 3.41667 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.46038 ms - Host latency: 3.55156 ms (end to end 3.79253 ms, enqueue 3.43284 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.46885 ms - Host latency: 3.56018 ms (end to end 3.79839 ms, enqueue 3.43962 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44705 ms - Host latency: 3.5373 ms (end to end 3.77937 ms, enqueue 3.4199 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.4395 ms - Host latency: 3.5303 ms (end to end 3.76873 ms, enqueue 3.41216 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44846 ms - Host latency: 3.53872 ms (end to end 3.77979 ms, enqueue 3.42109 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44646 ms - Host latency: 3.53643 ms (end to end 3.77808 ms, enqueue 3.41929 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44294 ms - Host latency: 3.53396 ms (end to end 3.77244 ms, enqueue 3.4156 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44421 ms - Host latency: 3.53481 ms (end to end 3.77354 ms, enqueue 3.41689 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44258 ms - Host latency: 3.53215 ms (end to end 3.77285 ms, enqueue 3.41482 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44424 ms - Host latency: 3.53337 ms (end to end 3.77344 ms, enqueue 3.41731 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44153 ms - Host latency: 3.53201 ms (end to end 3.77329 ms, enqueue 3.41228 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44143 ms - Host latency: 3.5322 ms (end to end 3.77083 ms, enqueue 3.41436 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44148 ms - Host latency: 3.53093 ms (end to end 3.77222 ms, enqueue 3.41467 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44146 ms - Host latency: 3.53203 ms (end to end 3.77231 ms, enqueue 3.41406 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.54531 ms - Host latency: 3.64521 ms (end to end 3.86665 ms, enqueue 3.51501 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.45999 ms - Host latency: 3.56462 ms (end to end 3.7928 ms, enqueue 3.43899 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.48042 ms - Host latency: 3.5908 ms (end to end 3.80527 ms, enqueue 3.44353 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44473 ms - Host latency: 3.55784 ms (end to end 3.77639 ms, enqueue 3.41724 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44124 ms - Host latency: 3.55315 ms (end to end 3.77246 ms, enqueue 3.41392 ms)
[08/20/2021-15:24:20] [I] Host Latency
[08/20/2021-15:24:20] [I] min: 3.47107 ms (end to end 3.73413 ms)
[08/20/2021-15:24:20] [I] max: 6.76953 ms (end to end 7.0376 ms)
[08/20/2021-15:24:20] [I] mean: 3.53693 ms (end to end 3.79344 ms)
[08/20/2021-15:24:20] [I] median: 3.51984 ms (end to end 3.77124 ms)
[08/20/2021-15:24:20] [I] percentile: 4.05872 ms at 99% (end to end 4.4162 ms at 99%)
[08/20/2021-15:24:20] [I] throughput: 0 qps
[08/20/2021-15:24:20] [I] walltime: 3.00937 s
[08/20/2021-15:24:20] [I] Enqueue Time
[08/20/2021-15:24:20] [I] min: 3.38159 ms
[08/20/2021-15:24:20] [I] max: 6.74084 ms
[08/20/2021-15:24:20] [I] median: 3.41595 ms
[08/20/2021-15:24:20] [I] GPU Compute
[08/20/2021-15:24:20] [I] min: 3.41248 ms
[08/20/2021-15:24:20] [I] max: 6.71069 ms
[08/20/2021-15:24:20] [I] mean: 3.46488 ms
[08/20/2021-15:24:20] [I] median: 3.44322 ms
[08/20/2021-15:24:20] [I] percentile: 4.00134 ms at 99%
[08/20/2021-15:24:20] [I] total compute time: 3.00751 s
&&&& PASSED TensorRT.trtexec # ./trtexec --onnx=baseline_R101.onnx --workspace=1024 --fp16
root@eda66a52647c:~/TensorRT-7.2.2.3/bin#
1 Like

Thanks @mchi this solved the issue.