Onnx Model FAILED_ALLOCATION: basic_string::_S_construct null not valid

mohammad1 · July 28, 2021, 5:57am

• Hardware Platform (Jetson / GPU) Nvidia T4 GPU
• DeepStream Version 5.0
• TensorRT Version 7.0
• NVIDIA GPU Driver Version (valid for GPU only) 460.56
• Issue Type( questions, new requirements, bugs) bug

I was trying to use an Onnx model as a secondary detector model in the pipeline. We want to extract the output-tensor which has size 2048. When the pipeline tries to create the engine file, I was faced with the following errors.

INFO: ../nvdsinfer/nvdsinfer_func_utils.cpp:37 [TRT]: Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
INFO: ../nvdsinfer/nvdsinfer_func_utils.cpp:37 [TRT]: Detected 1 inputs and 1 output network tensors.
ERROR: ../nvdsinfer/nvdsinfer_func_utils.cpp:31 [TRT]: FAILED_ALLOCATION: basic_string::_S_construct null not valid
ERROR: ../nvdsinfer/nvdsinfer_model_builder.cpp:1387 Serialize engine failed to file: /home/VpsSamples/VPService/Plugins/gstdynamic/models/Person_Reidentification/basel
ine_R101.onnx_b1_gpu0_fp16.engine
INFO: ../nvdsinfer/nvdsinfer_model_builder.cpp:685 [FullDims Engine Info]: layers num: 2
0   INPUT  kFLOAT batched_inputs.1 3x384x128       min: 1x3x384x128     opt: 1x3x384x128     Max: 1x3x384x128
1   OUTPUT kFLOAT 1896            2048            min: 0               opt: 0               Max: 0

ERROR: ../nvdsinfer/nvdsinfer_func_utils.cpp:31 [TRT]: ../rtSafe/cuda/cudaActivationRunner.cpp (96) - Cudnn Error in execute: 8 (CUDNN_STATUS_EXECUTION_FAILED)
ERROR: ../nvdsinfer/nvdsinfer_func_utils.cpp:31 [TRT]: FAILED_EXECUTION: std::exception
ERROR: nvdsinfer_backend.cpp:459 Failed to enqueue trt inference batch
ERROR: nvdsinfer_context_impl.cpp:1408 Infer context enqueue buffer failed, nvinfer error:NVDSINFER_TENSORRT_ERROR
ERROR: from element /GstPipeline:pipeline0/GstVpsDeepStream:vpsdeepstream0/GstNvInfer:person_reidentification: Failed to queue input batch for inferencing
Additional debug info:
gstnvinfer.cpp(1188): gst_nvinfer_input_queue_loop (): /GstPipeline:pipeline0/GstVpsDeepStream:vpsdeepstream0/GstNvInfer:person_reidentification
Execution ended after 0:00:01.901812224
Setting pipeline to PAUSED ...
Setting pipeline to READY ...
ERROR: ../nvdsinfer/nvdsinfer_func_utils.cpp:31 [TRT]: ../rtSafe/cuda/cudaActivationRunner.cpp (96) - Cudnn Error in execute: 8 (CUDNN_STATUS_EXECUTION_FAILED)
ERROR: ../nvdsinfer/nvdsinfer_func_utils.cpp:31 [TRT]: FAILED_EXECUTION: std::exception
ERROR: nvdsinfer_backend.cpp:459 Failed to enqueue trt inference batch
ERROR: nvdsinfer_context_impl.cpp:1408 Infer context enqueue buffer failed, nvinfer error:NVDSINFER_TENSORRT_ERROR
GPUassert: an illegal memory access was encountered src/modules/cuDCF/cudaCropScaleInTexture2D.cu 1254
ERROR: ../nvdsinfer/nvdsinfer_func_utils.cpp:31 [TRT]: ../rtSafe/cuda/cudaScaleRunner.cpp (114) - Cuda Error in execute: 700 (an illegal memory access was encountered)
ERROR: nvdsinfer_context_impl.cpp:1448 Failed to synchronize on cuda copy-coplete-event, cuda err_no:700, err_str:cudaErrorIllegalAddress
ERROR: ../nvdsinfer/nvdsinfer_func_utils.cpp:31 [TRT]: FAILED_EXECUTION: std::exception
ERROR: nvdsinfer_backend.cpp:459 Failed to enqueue trt inference batch
ERROR: nvdsinfer_context_impl.cpp:1408 Infer context enqueue buffer failed, nvinfer error:NVDSINFER_TENSORRT_ERROR
[ERROR] 2021-07-25 14:16:10 CUDA error 700 (cudaErrorIllegalAddress): an illegal memory access was encountered
ERROR: nvdsinfer_context_impl.cpp:333 Failed to make stream wait on event, cuda err_no:700, err_str:cudaErrorIllegalAddress
ERROR: nvdsinfer_context_impl.cpp:1384 Preprocessor transform input data failed., nvinfer error:NVDSINFER_CUDA_ERROR
Cuda failure: status=700 in CreateTextureObj at line 2496
[ERROR] 2021-07-25 14:16:10 Error destroying cuda device: VPI_STATUS_INTERNAL
nvbufsurftransform.cpp(2369) : getLastCudaError() CUDA error : Recevied NvBufSurfTransformError_Execution_Error : (400) invalid resource handle.
(../inc/nvcudautils) Error ResourceError: e: all CUDA-capable devices are busy or unavailable (cudaErrorDevicesUnavailable) (propagating from /home/rlima/src/vpi/ext/nv
cudautils/src/AllocMem.cpp, function freeMem(), line 283)
ERROR: nvdsinfer_context_impl.cpp:333 Failed to make stream wait on event, cuda err_no:46, err_str:cudaErrorDevicesUnavailable
(../inc/nvcudautils) Error ResourceError:  (propagating from /home/rlima/src/vpi/ext/nvcudautils/inc/nvcudautils/detail/../AllocMem.h, function operator()(), line 45)
ERROR: nvdsinfer_context_impl.cpp:1384 Preprocessor transform input data failed., nvinfer error:NVDSINFER_CUDA_ERROR
(../inc/nvcudautils) Error ResourceError: e: all CUDA-capable devices are busy or unavailable (cudaErrorDevicesUnavailable) (propagating from /home/rlima/src/vpi/ext/nv
cudautils/src/AllocMem.cpp, function freeMem(), line 283)
(../inc/nvcudautils) Error ResourceError:  (propagating from /home/rlima/src/vpi/ext/nvcudautils/inc/nvcudautils/detail/../AllocMem.h, function operator()(), line 60)
(../inc/nvcudautils) Error ResourceError: e: all CUDA-capable devices are busy or unavailable (cudaErrorDevicesUnavailable) (propagating from /home/rlima/src/vpi/ext/nv
cudautils/src/AllocMem.cpp, function freeMem(), line 283)
(../inc/nvcudautils) Error ResourceError:  (propagating from /home/rlima/src/vpi/ext/nvcudautils/inc/nvcudautils/detail/../AllocMem.h, function operator()(), line 60)
Caught SIGSEGV
ERROR: nvdsinfer_context_impl.cpp:333 Failed to make stream wait on event, cuda err_no:46, err_str:cudaErrorDevicesUnavailable
ERROR: nvdsinfer_context_impl.cpp:1384 Preprocessor transform input data failed., nvinfer error:NVDSINFER_CUDA_ERROR
exec gdb failed: No such file or directory
./fast_running.sh: line 13:   994 Segmentation fault      (core dumped) gst-launch-1.0 vpsdeepstream

Please find attached a sample of the configuration of the onnx model:

[property]
gpu-id=0
#net-scale-factor=0.0174292
#offsets=123.675;116.28;103.53
onnx-file=../models/Person_Reidentification/baseline_R101.onnx
#labelfile-path=labels.txt
batch-size=1
process-mode=2
# work on first primary - people class
operate-on-gie-id=11
operate-on-class-ids=0
model-color-format=0
## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=2
num-detected-classes=1
interval=0
gie-unique-id=17
model-engine-file=../models/Person_Reidentification/baseline_R101_FP16.engine
#baseline_R101.onnx_b1_gpu0_fp16.engine
network-type=100
output-tensor-meta=1
workspace-size=2000

Can you please assist in this?

Amycao · August 2, 2021, 9:13am

Hi,
ERROR: …/nvdsinfer/nvdsinfer_func_utils.cpp:31 [TRT]: …/rtSafe/cuda/cudaActivationRunner.cpp (96) - Cudnn Error in execute: 8 (CUDNN_STATUS_EXECUTION_FAILED)

Which CUDNN version you are using? and did you run directly on host or within docker?

mohammad1 · August 2, 2021, 9:56am

I am running the code within the docker container.

I guess the cudnn version is 7.6.5

mchi · August 3, 2021, 1:26am

mohammad1:

INFO: ../nvdsinfer/nvdsinfer_func_utils.cpp:37 [TRT]: Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
INFO: ../nvdsinfer/nvdsinfer_func_utils.cpp:37 [TRT]: Detected 1 inputs and 1 output network tensors.
ERROR: ../nvdsinfer/nvdsinfer_func_utils.cpp:31 [TRT]: FAILED_ALLOCATION: basic_string::_S_construct null not valid

From this error, could you try increasing the “workspace-size” to 1024 or more ?

https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_plugin_gst-nvinfer.html

mohammad1 · August 12, 2021, 7:25pm

I tried to set it to different values (1024, 2048, 4096, 10000). same problem

mchi · August 13, 2021, 12:28am

Ok, could you use trtexec to run your model?
Or is it possible to share the model?

mohammad1 · August 14, 2021, 6:18am

We used trtexec to run the model yet we get the same error.

Please find attached the link of the onnx model. baseline_R101.onnx - Google Drive

your feedback is appreciated.

mohammad1 · August 20, 2021, 11:15am

@mchi @Amycao hello, any updates about this issue?? Best.

mchi · August 20, 2021, 3:28pm

Hi @mohammad1 ,
Sorrt for delay!

I can see some waring like below if using trtexec to run your onnx model.

[08/20/2021-15:09:06] [E] [TRT] ../rtSafe/cuda/cudaScaleRunner.cpp (114) - Cuda Error in execute: 2 (out of memory)
[08/20/2021-15:09:06] [E] [TRT] FAILED_EXECUTION: std::exception
[08/20/2021-15:09:06] [W] [TRT] Explicit batch network detected and batch size specified, use enqueue without batch size instead.
[08/20/2021-15:09:06] [E] [TRT] ../rtSafe/cuda/cudaScaleRunner.cpp (114) - Cuda Error in execute: 2 (out of memory)
[08/20/2021-15:09:06] [E] [TRT] FAILED_EXECUTION: std::exception
[08/20/2021-15:09:06] [W] [TRT] Explicit batch network detected and batch size specified, use enqueue without batch size instead.
[08/20/2021-15:09:06] [E] [TRT] ../rtSafe/cuda/cudaScaleRunner.cpp (114) - Cuda Error in execute: 2 (out of memory)
[08/20/2021-15:09:06] [E] [TRT] FAILED_EXECUTION: std::exception
[08/20/2021-15:09:06] [W] [TRT] Explicit batch network detected and batch size specified, use enqueue without batch size instead.
[08/20/2021-15:09:06] [E] [TRT] ../rtSafe/cuda/cudaScaleRunner.cpp (114) - Cuda Error in execute: 2 (out of memory)
[08/20/2021-15:09:06] [E] [TRT] FAILED_EXECUTION: std::exception

And, I tried TensorRT-7.2.x , there is not any error and warning as below.
Since DeepStream 5.1 is using TensorRT 7.2.x, could you upgrade your DeepStream to DS 5.1 ?

root@eda66a52647c:~/TensorRT-7.2.2.3/bin# ./trtexec --onnx=baseline_R101.onnx --workspace=1024 --fp16
&&&& RUNNING TensorRT.trtexec # ./trtexec --onnx=baseline_R101.onnx --workspace=1024 --fp16
[08/20/2021-15:19:12] [I] === Model Options ===
[08/20/2021-15:19:12] [I] Format: ONNX
[08/20/2021-15:19:12] [I] Model: baseline_R101.onnx
[08/20/2021-15:19:12] [I] Output:
[08/20/2021-15:19:12] [I] === Build Options ===
[08/20/2021-15:19:12] [I] Max batch: explicit
[08/20/2021-15:19:12] [I] Workspace: 1024 MiB
[08/20/2021-15:19:12] [I] minTiming: 1
[08/20/2021-15:19:12] [I] avgTiming: 8
[08/20/2021-15:19:12] [I] Precision: FP32+FP16
[08/20/2021-15:19:12] [I] Calibration:
[08/20/2021-15:19:12] [I] Refit: Disabled
[08/20/2021-15:19:12] [I] Safe mode: Disabled
[08/20/2021-15:19:12] [I] Save engine:
[08/20/2021-15:19:12] [I] Load engine:
[08/20/2021-15:19:12] [I] Builder Cache: Enabled
[08/20/2021-15:19:12] [I] NVTX verbosity: 0
[08/20/2021-15:19:12] [I] Tactic sources: Using default tactic sources
[08/20/2021-15:19:12] [I] Input(s)s format: fp32:CHW
[08/20/2021-15:19:12] [I] Output(s)s format: fp32:CHW
[08/20/2021-15:19:12] [I] Input build shapes: model
[08/20/2021-15:19:12] [I] Input calibration shapes: model
[08/20/2021-15:19:12] [I] === System Options ===
[08/20/2021-15:19:12] [I] Device: 0
[08/20/2021-15:19:12] [I] DLACore:
[08/20/2021-15:19:12] [I] Plugins:
[08/20/2021-15:19:12] [I] === Inference Options ===
[08/20/2021-15:19:12] [I] Batch: Explicit
[08/20/2021-15:19:12] [I] Input inference shapes: model
[08/20/2021-15:19:12] [I] Iterations: 10
[08/20/2021-15:19:12] [I] Duration: 3s (+ 200ms warm up)
[08/20/2021-15:19:12] [I] Sleep time: 0ms
[08/20/2021-15:19:12] [I] Streams: 1
[08/20/2021-15:19:12] [I] ExposeDMA: Disabled
[08/20/2021-15:19:12] [I] Data transfers: Enabled
[08/20/2021-15:19:12] [I] Spin-wait: Disabled
[08/20/2021-15:19:12] [I] Multithreading: Disabled
[08/20/2021-15:19:12] [I] CUDA Graph: Disabled
[08/20/2021-15:19:12] [I] Separate profiling: Disabled
[08/20/2021-15:19:12] [I] Skip inference: Disabled
[08/20/2021-15:19:12] [I] Inputs:
[08/20/2021-15:19:12] [I] === Reporting Options ===
[08/20/2021-15:19:12] [I] Verbose: Disabled
[08/20/2021-15:19:12] [I] Averages: 10 inferences
[08/20/2021-15:19:12] [I] Percentile: 99
[08/20/2021-15:19:12] [I] Dump refittable layers:Disabled
[08/20/2021-15:19:12] [I] Dump output: Disabled
[08/20/2021-15:19:12] [I] Profile: Disabled
[08/20/2021-15:19:12] [I] Export timing to JSON file:
[08/20/2021-15:19:12] [I] Export output to JSON file:
[08/20/2021-15:19:12] [I] Export profile to JSON file:
[08/20/2021-15:19:12] [I]
[08/20/2021-15:19:12] [I] === Device Information ===
[08/20/2021-15:19:12] [I] Selected Device: Tesla T4
[08/20/2021-15:19:12] [I] Compute Capability: 7.5
[08/20/2021-15:19:12] [I] SMs: 40
[08/20/2021-15:19:12] [I] Compute Clock Rate: 1.59 GHz
[08/20/2021-15:19:12] [I] Device Global Memory: 15109 MiB
[08/20/2021-15:19:12] [I] Shared Memory per SM: 64 KiB
[08/20/2021-15:19:12] [I] Memory Bus Width: 256 bits (ECC enabled)
[08/20/2021-15:19:12] [I] Memory Clock Rate: 5.001 GHz
[08/20/2021-15:19:12] [I]
----------------------------------------------------------------
Input filename:   baseline_R101.onnx
ONNX IR version:  0.0.6
Opset version:    9
Producer name:    pytorch
Producer version: 1.8
Domain:
Model version:    0
Doc string:
----------------------------------------------------------------
[08/20/2021-15:19:36] [W] [TRT] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[08/20/2021-15:20:45] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.


[08/20/2021-15:24:16] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[08/20/2021-15:24:16] [I] Engine built in 304.315 sec.
[08/20/2021-15:24:16] [I] Starting inference
[08/20/2021-15:24:20] [I] Warmup completed 0 queries over 200 ms
[08/20/2021-15:24:20] [I] Timing trace has 0 queries over 3.00937 s
[08/20/2021-15:24:20] [I] Trace averages of 10 runs:
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 4.00135 ms - Host latency: 4.06035 ms (end to end 4.41789 ms, enqueue 3.97348 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.99778 ms - Host latency: 4.05611 ms (end to end 4.4118 ms, enqueue 3.96982 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.51036 ms - Host latency: 3.567 ms (end to end 3.84879 ms, enqueue 3.48986 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.46995 ms - Host latency: 3.52767 ms (end to end 3.79908 ms, enqueue 3.4435 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.46776 ms - Host latency: 3.5238 ms (end to end 3.79459 ms, enqueue 3.43646 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44617 ms - Host latency: 3.50457 ms (end to end 3.77132 ms, enqueue 3.41883 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.45239 ms - Host latency: 3.50988 ms (end to end 3.77225 ms, enqueue 3.424 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.46645 ms - Host latency: 3.52231 ms (end to end 3.78731 ms, enqueue 3.43697 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.45866 ms - Host latency: 3.51561 ms (end to end 3.7812 ms, enqueue 3.42833 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.45136 ms - Host latency: 3.50894 ms (end to end 3.77388 ms, enqueue 3.42327 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44476 ms - Host latency: 3.5025 ms (end to end 3.76958 ms, enqueue 3.41596 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44251 ms - Host latency: 3.49929 ms (end to end 3.76674 ms, enqueue 3.41505 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44622 ms - Host latency: 3.50388 ms (end to end 3.76889 ms, enqueue 3.41829 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.46769 ms - Host latency: 3.52604 ms (end to end 3.79064 ms, enqueue 3.44478 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.46718 ms - Host latency: 3.52679 ms (end to end 3.78146 ms, enqueue 3.43151 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44435 ms - Host latency: 3.50246 ms (end to end 3.76851 ms, enqueue 3.41589 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44308 ms - Host latency: 3.50086 ms (end to end 3.76548 ms, enqueue 3.41355 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44459 ms - Host latency: 3.50162 ms (end to end 3.76924 ms, enqueue 3.41638 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.45134 ms - Host latency: 3.5095 ms (end to end 3.77651 ms, enqueue 3.42298 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44346 ms - Host latency: 3.50156 ms (end to end 3.76579 ms, enqueue 3.41467 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44664 ms - Host latency: 3.50663 ms (end to end 3.77005 ms, enqueue 3.41871 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.45175 ms - Host latency: 3.50883 ms (end to end 3.77444 ms, enqueue 3.42216 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44796 ms - Host latency: 3.50464 ms (end to end 3.77305 ms, enqueue 3.41919 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.4465 ms - Host latency: 3.50477 ms (end to end 3.77224 ms, enqueue 3.41922 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44836 ms - Host latency: 3.50614 ms (end to end 3.77371 ms, enqueue 3.42007 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44287 ms - Host latency: 3.49956 ms (end to end 3.76313 ms, enqueue 3.4147 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44493 ms - Host latency: 3.50304 ms (end to end 3.76903 ms, enqueue 3.41642 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.4524 ms - Host latency: 3.51078 ms (end to end 3.77753 ms, enqueue 3.4244 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44131 ms - Host latency: 3.50018 ms (end to end 3.76464 ms, enqueue 3.41384 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44385 ms - Host latency: 3.5017 ms (end to end 3.76737 ms, enqueue 3.41583 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.45083 ms - Host latency: 3.51042 ms (end to end 3.77571 ms, enqueue 3.42408 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44357 ms - Host latency: 3.50276 ms (end to end 3.76998 ms, enqueue 3.41523 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44934 ms - Host latency: 3.5085 ms (end to end 3.77589 ms, enqueue 3.4217 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.4395 ms - Host latency: 3.4983 ms (end to end 3.76572 ms, enqueue 3.41261 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.42413 ms - Host latency: 3.48284 ms (end to end 3.75022 ms, enqueue 3.3963 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.42428 ms - Host latency: 3.48291 ms (end to end 3.74928 ms, enqueue 3.39672 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.4229 ms - Host latency: 3.48159 ms (end to end 3.74872 ms, enqueue 3.39501 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.43341 ms - Host latency: 3.49188 ms (end to end 3.75796 ms, enqueue 3.40552 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.42021 ms - Host latency: 3.47891 ms (end to end 3.74631 ms, enqueue 3.39274 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.42374 ms - Host latency: 3.48311 ms (end to end 3.74755 ms, enqueue 3.39591 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.4296 ms - Host latency: 3.48815 ms (end to end 3.75398 ms, enqueue 3.40222 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.42448 ms - Host latency: 3.4825 ms (end to end 3.7486 ms, enqueue 3.39642 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.43486 ms - Host latency: 3.49396 ms (end to end 3.76174 ms, enqueue 3.4079 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44043 ms - Host latency: 3.49948 ms (end to end 3.76447 ms, enqueue 3.41233 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44691 ms - Host latency: 3.50566 ms (end to end 3.77443 ms, enqueue 3.41991 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44276 ms - Host latency: 3.50132 ms (end to end 3.77156 ms, enqueue 3.4155 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44285 ms - Host latency: 3.5016 ms (end to end 3.76709 ms, enqueue 3.41322 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44987 ms - Host latency: 3.50875 ms (end to end 3.77667 ms, enqueue 3.42384 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44092 ms - Host latency: 3.50046 ms (end to end 3.76659 ms, enqueue 3.41187 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44626 ms - Host latency: 3.50549 ms (end to end 3.77377 ms, enqueue 3.41906 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.4423 ms - Host latency: 3.50115 ms (end to end 3.7652 ms, enqueue 3.41444 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44398 ms - Host latency: 3.50303 ms (end to end 3.77057 ms, enqueue 3.41649 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.79803 ms - Host latency: 3.88043 ms (end to end 4.11862 ms, enqueue 3.76943 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.45247 ms - Host latency: 3.54248 ms (end to end 3.7806 ms, enqueue 3.42371 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.45388 ms - Host latency: 3.54192 ms (end to end 3.78384 ms, enqueue 3.42512 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44255 ms - Host latency: 3.53115 ms (end to end 3.77153 ms, enqueue 3.41582 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44795 ms - Host latency: 3.53709 ms (end to end 3.77661 ms, enqueue 3.41885 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44797 ms - Host latency: 3.53667 ms (end to end 3.77817 ms, enqueue 3.42085 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.4449 ms - Host latency: 3.53418 ms (end to end 3.77153 ms, enqueue 3.41711 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.4519 ms - Host latency: 3.53999 ms (end to end 3.78027 ms, enqueue 3.42419 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44717 ms - Host latency: 3.53633 ms (end to end 3.77695 ms, enqueue 3.4209 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44956 ms - Host latency: 3.53936 ms (end to end 3.7759 ms, enqueue 3.42041 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44614 ms - Host latency: 3.53594 ms (end to end 3.7717 ms, enqueue 3.41836 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44526 ms - Host latency: 3.53464 ms (end to end 3.77429 ms, enqueue 3.41794 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.45327 ms - Host latency: 3.54202 ms (end to end 3.7843 ms, enqueue 3.42607 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44871 ms - Host latency: 3.53853 ms (end to end 3.77405 ms, enqueue 3.4196 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44419 ms - Host latency: 3.53369 ms (end to end 3.7717 ms, enqueue 3.41667 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.46038 ms - Host latency: 3.55156 ms (end to end 3.79253 ms, enqueue 3.43284 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.46885 ms - Host latency: 3.56018 ms (end to end 3.79839 ms, enqueue 3.43962 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44705 ms - Host latency: 3.5373 ms (end to end 3.77937 ms, enqueue 3.4199 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.4395 ms - Host latency: 3.5303 ms (end to end 3.76873 ms, enqueue 3.41216 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44846 ms - Host latency: 3.53872 ms (end to end 3.77979 ms, enqueue 3.42109 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44646 ms - Host latency: 3.53643 ms (end to end 3.77808 ms, enqueue 3.41929 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44294 ms - Host latency: 3.53396 ms (end to end 3.77244 ms, enqueue 3.4156 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44421 ms - Host latency: 3.53481 ms (end to end 3.77354 ms, enqueue 3.41689 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44258 ms - Host latency: 3.53215 ms (end to end 3.77285 ms, enqueue 3.41482 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44424 ms - Host latency: 3.53337 ms (end to end 3.77344 ms, enqueue 3.41731 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44153 ms - Host latency: 3.53201 ms (end to end 3.77329 ms, enqueue 3.41228 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44143 ms - Host latency: 3.5322 ms (end to end 3.77083 ms, enqueue 3.41436 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44148 ms - Host latency: 3.53093 ms (end to end 3.77222 ms, enqueue 3.41467 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44146 ms - Host latency: 3.53203 ms (end to end 3.77231 ms, enqueue 3.41406 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.54531 ms - Host latency: 3.64521 ms (end to end 3.86665 ms, enqueue 3.51501 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.45999 ms - Host latency: 3.56462 ms (end to end 3.7928 ms, enqueue 3.43899 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.48042 ms - Host latency: 3.5908 ms (end to end 3.80527 ms, enqueue 3.44353 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44473 ms - Host latency: 3.55784 ms (end to end 3.77639 ms, enqueue 3.41724 ms)
[08/20/2021-15:24:20] [I] Average on 10 runs - GPU latency: 3.44124 ms - Host latency: 3.55315 ms (end to end 3.77246 ms, enqueue 3.41392 ms)
[08/20/2021-15:24:20] [I] Host Latency
[08/20/2021-15:24:20] [I] min: 3.47107 ms (end to end 3.73413 ms)
[08/20/2021-15:24:20] [I] max: 6.76953 ms (end to end 7.0376 ms)
[08/20/2021-15:24:20] [I] mean: 3.53693 ms (end to end 3.79344 ms)
[08/20/2021-15:24:20] [I] median: 3.51984 ms (end to end 3.77124 ms)
[08/20/2021-15:24:20] [I] percentile: 4.05872 ms at 99% (end to end 4.4162 ms at 99%)
[08/20/2021-15:24:20] [I] throughput: 0 qps
[08/20/2021-15:24:20] [I] walltime: 3.00937 s
[08/20/2021-15:24:20] [I] Enqueue Time
[08/20/2021-15:24:20] [I] min: 3.38159 ms
[08/20/2021-15:24:20] [I] max: 6.74084 ms
[08/20/2021-15:24:20] [I] median: 3.41595 ms
[08/20/2021-15:24:20] [I] GPU Compute
[08/20/2021-15:24:20] [I] min: 3.41248 ms
[08/20/2021-15:24:20] [I] max: 6.71069 ms
[08/20/2021-15:24:20] [I] mean: 3.46488 ms
[08/20/2021-15:24:20] [I] median: 3.44322 ms
[08/20/2021-15:24:20] [I] percentile: 4.00134 ms at 99%
[08/20/2021-15:24:20] [I] total compute time: 3.00751 s
&&&& PASSED TensorRT.trtexec # ./trtexec --onnx=baseline_R101.onnx --workspace=1024 --fp16
root@eda66a52647c:~/TensorRT-7.2.2.3/bin#

mohammad1 · August 22, 2021, 11:13pm

Thanks @mchi this solved the issue.

Topic		Replies	Views
Azure CustomVision ONNX model stopped working in DS 6.2 DeepStream SDK	10	513	October 23, 2023
tensorRT inference unstable compared onnxruntime TensorRT	4	1313	May 4, 2021
Issue with Deepstream Inference of custom 3D action recognition model DeepStream SDK	8	1006	May 18, 2022
Reshaping error when set batch-size greater than 1 in onnx modle DeepStream SDK	23	1240	February 10, 2023
deserializeCudaEngine failed. Serialization assertion magicTagRead == kMAGIC_TAG failed.Magic tag does not match TensorRT	4	2860	April 22, 2024
ERORR with ONNX2TRT : Unknown embedded device detected Jetson Xavier NX onnx	18	4553	April 27, 2022
Bus error while running deepstream refrerence app DeepStream SDK	13	1721	October 12, 2021
DW_DNN_INVALID_MODEL error for trt model (isPointPillarNet \| NVIDIA NGC) TAO Toolkit tensorrt , driveworks , onnx	6	31	February 12, 2025
Issues while converting ONNX to TRT Jetson Nano tensorrt , onnx	9	1269	October 18, 2021
Migrated from DeepStream 4 to Deepstream 5 and got errors DeepStream SDK nvbugs	36	2388	October 12, 2021

Onnx Model FAILED_ALLOCATION: basic_string::_S_construct null not valid

Related topics