Run yolov3_tiny.engine from python

nazarov-alexey · October 30, 2020, 11:59am

• Hardware Platform (Jetson / GPU) AGX Xavier
• DeepStream Version 5.0
• JetPack Version (valid for Jetson only) 4.4
• TensorRT Version 7.1.3.0

Hi!

I converted darknet yolov3-tiny to trt engine using libnvdsinfer_custom_impl_Yolo.so.
Now I want to run this engine using the python api:

with open("yolov3-tiny.engine", "rb") as f, trt.Runtime(TRT_LOGGER) as runtime:
        engine = runtime.deserialize_cuda_engine(f.read())
        context = engine.create_execution_context()

And I am getting the error:

[TensorRT] ERROR: INVALID_ARGUMENT: getPluginCreator could not find plugin YoloLayer_TRT version 1
[TensorRT] ERROR: safeDeserializationUtils.cpp (322) - Serialization Error in load: 0 (Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry)
[TensorRT] ERROR: INVALID_STATE: std::exception
[TensorRT] ERROR: INVALID_CONFIG: Deserialize the cuda engine failed.
engine
Traceback (most recent call last):
File “trt_yolo.py”, line 65, in
inference = TrtInference()
File “trt_yolo.py”, line 19, in init
self.context = engine.create_execution_context()
AttributeError: ‘NoneType’ object has no attribute ‘create_execution_context’

can I run this engine using python api?

AastaLLL · November 4, 2020, 8:57am

Hi,

The implementation of YoloLayer_TRT plugin is in the libnvdsinfer_custom_impl_Yolo.so.
So please link the library to the TensorRT application.

For example:

/usr/src/tensorrt/bin/trtexec --loadEngine=model_b1_gpu0_fp32.engine --plugins=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so

For python interface, the .so file can be linked via ctypes module.
You can find an example in the below file:

/usr/src/tensorrt/samples/python/uff_custom_plugin/sample.py

Thanks.

nazarov-alexey · November 4, 2020, 11:13am

Hi @AastaLLL, thanks for reply!

I added ctypes.CDLL("./libnvdsinfer_custom_impl_Yolo.so") in my script and YoloLayer_TRT was found.
But now I get another error:

Deserialize yoloLayerV3 plugin: yolo_17
Deserialize yoloLayerV3 plugin: yolo_24
[TensorRT] ERROR: engine.cpp (986) - Cuda Error in executeInternal: 700 (an illegal memory access was encountered)
[TensorRT] ERROR: FAILED_EXECUTION: std::exception
Traceback (most recent call last):
File “trt_yolo.py”, line 56, in
inference.inference(image_)
File “trt_yolo.py”, line 41, in inference
cuda.memcpy_dtoh(self.out_cpu, self.out_gpu)
pycuda._driver.LogicError: cuMemcpyDtoH failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFreeHost failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFreeHost failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered

I use this script with other network engine file and it works.

AastaLLL · November 5, 2020, 5:28am

Hi,

The error looks like your model cannot be inferenced with TensorRT.
Could you try if trtexec works or not first?

/usr/src/tensorrt/bin/trtexec --loadEngine=[your/engine/file] --plugins=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so

Thanks.

nazarov-alexey · November 5, 2020, 6:01am

I got

&&&& RUNNING TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --loadEngine=yolo.engine → plugins=/home/nvidia/yoloToTrt/libnvdsinfer_custom_impl_Yolo.so
[11/05/2020-11:13:51] [I] === Model Options ===
[11/05/2020-11:13:51] [I] Format: *
[11/05/2020-11:13:51] [I] Model:
[11/05/2020-11:13:51] [I] Output:
[11/05/2020-11:13:51] [I] === Build Options ===
[11/05/2020-11:13:51] [I] Max batch: 1
[11/05/2020-11:13:51] [I] Workspace: 16 MB
[11/05/2020-11:13:51] [I] minTiming: 1
[11/05/2020-11:13:51] [I] avgTiming: 8
[11/05/2020-11:13:51] [I] Precision: FP32
[11/05/2020-11:13:51] [I] Calibration:
[11/05/2020-11:13:51] [I] Safe mode: Disabled
[11/05/2020-11:13:51] [I] Save engine:
[11/05/2020-11:13:51] [I] Load engine: yolo.engine
[11/05/2020-11:13:51] [I] Builder Cache: Enabled
[11/05/2020-11:13:51] [I] NVTX verbosity: 0
[11/05/2020-11:13:51] [I] Inputs format: fp32:CHW
[11/05/2020-11:13:51] [I] Outputs format: fp32:CHW
[11/05/2020-11:13:51] [I] Input build shapes: model
[11/05/2020-11:13:51] [I] Input calibration shapes: model
[11/05/2020-11:13:51] [I] === System Options ===
[11/05/2020-11:13:51] [I] Device: 0
[11/05/2020-11:13:51] [I] DLACore:
[11/05/2020-11:13:51] [I] Plugins: /home/nvidia/yoloToTrt/libnvdsinfer_custom_impl_Yolo.so
[11/05/2020-11:13:51] [I] === Inference Options ===
[11/05/2020-11:13:51] [I] Batch: 1
[11/05/2020-11:13:51] [I] Input inference shapes: model
[11/05/2020-11:13:51] [I] Iterations: 10
[11/05/2020-11:13:51] [I] Duration: 3s (+ 200ms warm up)
[11/05/2020-11:13:51] [I] Sleep time: 0ms
[11/05/2020-11:13:51] [I] Streams: 1
[11/05/2020-11:13:51] [I] ExposeDMA: Disabled
[11/05/2020-11:13:51] [I] Spin-wait: Disabled
[11/05/2020-11:13:51] [I] Multithreading: Disabled
[11/05/2020-11:13:51] [I] CUDA Graph: Disabled
[11/05/2020-11:13:51] [I] Skip inference: Disabled
[11/05/2020-11:13:51] [I] Inputs:
[11/05/2020-11:13:51] [I] === Reporting Options ===
[11/05/2020-11:13:51] [I] Verbose: Disabled
[11/05/2020-11:13:51] [I] Averages: 10 inferences
[11/05/2020-11:13:51] [I] Percentile: 99
[11/05/2020-11:13:51] [I] Dump output: Disabled
[11/05/2020-11:13:51] [I] Profile: Disabled
[11/05/2020-11:13:51] [I] Export timing to JSON file:
[11/05/2020-11:13:51] [I] Export output to JSON file:
[11/05/2020-11:13:51] [I] Export profile to JSON file:
[11/05/2020-11:13:51] [I]
[11/05/2020-11:13:51] [I] Loading supplied plugin library: /home/nvidia/yoloToTrt/libnvdsinfer_custom_impl_Yolo.so
Deserialize yoloLayerV3 plugin: yolo_17
Deserialize yoloLayerV3 plugin: yolo_24
[11/05/2020-11:13:54] [I] Starting inference threads
[11/05/2020-11:13:58] [I] Warmup completed 5 queries over 200 ms
[11/05/2020-11:13:58] [I] Timing trace has 70 queries over 3.09888 s
[11/05/2020-11:13:58] [I] Trace averages of 10 runs:
[11/05/2020-11:13:58] [I] Average on 10 runs - GPU latency: 43.2709 ms - Host latency: 43.8051 ms (end to end 43.818 ms, enqueue 1.22207 ms)
[11/05/2020-11:13:58] [I] Average on 10 runs - GPU latency: 43.2599 ms - Host latency: 43.7939 ms (end to end 43.8081 ms, enqueue 1.08314 ms)
[11/05/2020-11:13:58] [I] Average on 10 runs - GPU latency: 44.3734 ms - Host latency: 44.9074 ms (end to end 44.9209 ms, enqueue 0.945178 ms)
[11/05/2020-11:13:58] [I] Average on 10 runs - GPU latency: 44.0091 ms - Host latency: 44.5429 ms (end to end 44.5571 ms, enqueue 0.881604 ms)
[11/05/2020-11:13:58] [I] Average on 10 runs - GPU latency: 44.0428 ms - Host latency: 44.5768 ms (end to end 44.592 ms, enqueue 0.949219 ms)
[11/05/2020-11:13:58] [I] Average on 10 runs - GPU latency: 43.6259 ms - Host latency: 44.1593 ms (end to end 44.1721 ms, enqueue 0.897559 ms)
[11/05/2020-11:13:58] [I] Average on 10 runs - GPU latency: 43.4646 ms - Host latency: 44.0046 ms (end to end 44.018 ms, enqueue 0.837988 ms)
[11/05/2020-11:13:58] [I] Host Latency
[11/05/2020-11:13:58] [I] min: 41.7312 ms (end to end 41.7419 ms)
[11/05/2020-11:13:58] [I] max: 45.7786 ms (end to end 45.7908 ms)
[11/05/2020-11:13:58] [I] mean: 44.2557 ms (end to end 44.2695 ms)
[11/05/2020-11:13:58] [I] median: 45.5899 ms (end to end 45.6002 ms)
[11/05/2020-11:13:58] [I] percentile: 45.7786 ms at 99% (end to end 45.7908 ms at 99%)
[11/05/2020-11:13:58] [I] throughput: 22.5888 qps
[11/05/2020-11:13:58] [I] walltime: 3.09888 s
[11/05/2020-11:13:58] [I] Enqueue Time
[11/05/2020-11:13:58] [I] min: 0.749268 ms
[11/05/2020-11:13:58] [I] max: 1.48499 ms
[11/05/2020-11:13:58] [I] median: 0.946655 ms
[11/05/2020-11:13:58] [I] GPU Compute
[11/05/2020-11:13:58] [I] min: 41.1997 ms
[11/05/2020-11:13:58] [I] max: 45.2434 ms
[11/05/2020-11:13:58] [I] mean: 43.721 ms
[11/05/2020-11:13:58] [I] median: 45.0555 ms
[11/05/2020-11:13:58] [I] percentile: 45.2434 ms at 99%
[11/05/2020-11:13:58] [I] total compute time: 3.06047 s
&&&& PASSED TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --loadEngine=yolo.engine --plugins=/home/nvidia/yoloToTrt/libnvdsinfer_custom_impl_Yolo.so

nazarov-alexey · November 5, 2020, 6:04am

I’m serializing the model with

#include "yolo.h"
#include "NvInferRuntimeCommon.h"
#include <iostream>
#include <fstream>

class Logger : public nvinfer1::ILogger           
 {
     void log(Severity severity, const char* msg) override
     {
         // suppress info-level messages
         if (severity != Severity::kINFO)
             std::cout << msg << std::endl;
     }
 } gLogger;

int main(){
    NetworkInfo info;
    info.networkType = "yolov3-tiny";
    info.configFilePath = "tiny/yolov3-tiny.cfg";
    info.wtsFilePath = "tiny/yolov3-tiny.weights";
    info.deviceType = "kGPU";
    info.inputBlobName = "yolov3-tiny";

    nvinfer1::IBuilder* builder = nvinfer1::createInferBuilder(gLogger);
    builder->setMaxBatchSize(1);

    nvinfer1::IBuilderConfig* config = builder->createBuilderConfig();    
    config->setMaxWorkspaceSize(1 << 30);
    config->setFlag(nvinfer1::BuilderFlag::kFP16);

    Yolo yolo(info);

    nvinfer1::ICudaEngine* engine = yolo.createEngine(builder);

    nvinfer1::IHostMemory *serializedModel = engine->serialize();
    engine->destroy();
    builder->destroy();

    std::ofstream ofs("yolo.engine", std::ios::out | std::ios::binary);
    ofs.write((char*)(serializedModel->data()), serializedModel->size());
    ofs.close();

    serializedModel->destroy();

    return 0;
}

nazarov-alexey · November 6, 2020, 9:53am

I got model_b1_gpu0_fp16.engine using deepstream-app -c ./deepstream_app_config_yoloV3_tiny.txt.
I’m trying to run my python app using this engine file, but I get the same error again:

Deserialize yoloLayerV3 plugin: yolo_17
Deserialize yoloLayerV3 plugin: yolo_24
[TensorRT] ERROR: engine.cpp (986) - Cuda Error in executeInternal: 700 (an illegal memory access was encountered)
[TensorRT] ERROR: FAILED_EXECUTION: std::exception
Traceback (most recent call last):
File “trt_yolo.py”, line 56, in
inference.inference(image_)
File “trt_yolo.py”, line 41, in inference
cuda.memcpy_dtoh(self.out_cpu, self.out_gpu)
pycuda._driver.LogicError: cuMemcpyDtoH failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFreeHost failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFreeHost failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered

AastaLLL · November 10, 2020, 2:40am

Hi,

Thanks for your testing.
We are checking this and share more information later.

AastaLLL · November 11, 2020, 5:56am

Hi,

We can run the YOLO engine with below script without issue.
Could you give it a try?

import ctypes
import pycuda.autoinit
import pycuda.driver as cuda
import tensorrt as trt
import numpy as np

ctypes.CDLL("/opt/nvidia/deepstream/deepstream-5.0/sources/objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so")
TRT_LOGGER = trt.Logger(trt.Logger.VERBOSE)

with open("model_b1_gpu0_fp32.engine", "rb") as f, trt.Runtime(TRT_LOGGER) as runtime:
    engine = runtime.deserialize_cuda_engine(f.read())
    context = engine.create_execution_context()

    bindings = []
    for binding in engine:
        size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size
        bindings.append(int(cuda.mem_alloc(4*size)))

    stream = cuda.Stream()
    context = engine.create_execution_context()
    context.execute_async(bindings=bindings, stream_handle=stream.handle)
    stream.synchronize()

Thanks.

nazarov-alexey · November 13, 2020, 9:11am

Hi, thanks for reply!
I allocated memory only for two bindings, but need for three. Now all work.
I get outputs with shapes (6498,) and (25992,)
Can I use libnvdsinfer_custom_impl_Yolo.so for parsing outputs and get bbox?

sanjay.deo · November 18, 2020, 4:58am

[TensorRT] WARNING: Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
conv1/convolution + activation_1/Relu6: 1.89542ms
block_1a_conv_1/convolution + activation_2/Relu6: 1.45514ms
block_1a_conv_2/convolution: 2.88618ms
block_1a_conv_shortcut/convolution + add_1/add + activation_3/Relu6: 0.554784ms
block_1b_conv_1/convolution + activation_4/Relu6: 3.05424ms
block_1b_conv_2/convolution: 2.25149ms
block_1b_conv_shortcut/convolution + add_2/add + activation_5/Relu6: 0.61792ms
block_2a_conv_1/convolution + activation_6/Relu6: 1.5256ms
block_2a_conv_2/convolution: 2.2432ms
block_2a_conv_shortcut/convolution + add_3/add + activation_7/Relu6: 0.336288ms
block_2b_conv_1/convolution + activation_8/Relu6: 1103.6ms
block_2b_conv_2/convolution: 2.32346ms
block_2b_conv_shortcut/convolution + add_4/add + activation_9/Relu6: 0.447488ms
block_3a_conv_1/convolution + activation_10/Relu6: 1.10899ms
block_3a_conv_2/convolution: 1.46125ms
block_3a_conv_shortcut/convolution + add_5/add + activation_11/Relu6: 0.238592ms
block_3b_conv_1/convolution + activation_12/Relu6: 2.00397ms
block_3b_conv_2/convolution: 1.71011ms
block_3b_conv_shortcut/convolution + add_6/add + activation_13/Relu6: 0.290784ms
block_4a_conv_1/convolution + activation_14/Relu6: 2.27453ms
block_4a_conv_2/convolution: 2.06624ms
block_4a_conv_shortcut/convolution + add_7/add + activation_15/Relu6: 0.308384ms
block_4b_conv_1/convolution + activation_16/Relu6: 1.16208ms
block_4b_conv_2/convolution: 0.805856ms
block_4b_conv_shortcut/convolution + add_8/add + activation_17/Relu6: 0.17408ms
output_cov/convolution: 0.058688ms
[TensorRT] ERROR: engine.cpp (725) - Cuda Error in reportTimes: 700 (an illegal memory access was encountered)
[TensorRT] ERROR: INTERNAL_ERROR: std::exception
[TensorRT] ERROR: engine.cpp (986) - Cuda Error in executeInternal: 700 (an illegal memory access was encountered)
[TensorRT] ERROR: FAILED_EXECUTION: std::exception
[TensorRT] ERROR: engine.cpp (179) - Cuda Error in ~ExecutionContext: 700 (an illegal memory access was encountered)
[TensorRT] ERROR: INTERNAL_ERROR: std::exception
[TensorRT] ERROR: Parameter check failed at: …/rtSafe/safeContext.cpp::terminateCommonContext::155, condition: cudnnDestroy(context.cudnn) failure.
[TensorRT] ERROR: Parameter check failed at: …/rtSafe/safeContext.cpp::terminateCommonContext::165, condition: cudaEventDestroy(context.start) failure.
[TensorRT] ERROR: Parameter check failed at: …/rtSafe/safeContext.cpp::terminateCommonContext::170, condition: cudaEventDestroy(context.stop) failure.
[TensorRT] ERROR: …/rtSafe/safeRuntime.cpp (32) - Cuda Error in free: 700 (an illegal memory access was encountered)
terminate called after throwing an instance of ‘nvinfer1::CudaError’
what(): std::exception
Aborted (core dumped)
@AastaLLL @kayccc
I am getting similar kind of when trying to run pre-trained tlt models after converting them to trt and running using Pycude script on Jetson NX. I was using PeopleNet (Resnet18).
Foolowed Pycuda script from here: Speeding Up Deep Learning Inference Using TensorFlow, ONNX, and NVIDIA TensorRT | NVIDIA Technical Blog

nazarov-alexey · November 18, 2020, 5:55am

It looks like you have created an engine file on one machine but want to run on another.

sanjay.deo · November 18, 2020, 5:57am

@nazarov-alexey
No, I created engine over jetson and trying to run on the same

nazarov-alexey · November 18, 2020, 10:32am

Did You try to run

/usr/src/tensorrt/bin/trtexec --loadEngine=your.engine

sanjay.deo · November 18, 2020, 10:36am

@nazarov-alexey Yes, that was running perfectly.

nazarov-alexey · November 18, 2020, 10:49am

And if You run this, you get the error?

Topic		Replies	Views
Error deserializing trt engine when bundling with Pyinstaller Jetson AGX Xavier tensorrt	8	1438	October 18, 2021
Tensorrt Inference Issue Jetson Orin Nano tensorrt , yolo	5	174	November 20, 2024
TensorRT engine giving wrong/different output in DeepStream DeepStream SDK	26	4101	February 22, 2022
Yolov8seg giving divide by 0 errors if no detection in frame DeepStream SDK	11	776	November 7, 2023
Custom Yolov3-tiny ota model update, segmentation fault Jetson Nano nvbugs , yolo , ota	18	2037	October 18, 2021
TensorRT Engine Model is not working correctly TensorRT	4	1304	August 22, 2024
Deepstream deployment of yolov3 with low FPS DeepStream SDK	19	1208	October 12, 2021
Tlt3.0 train yolov4 of resnet10, "tlt yolo_v4 inference" could get right bboxes, but deepstream5.1 get wrong result TAO Toolkit	9	669	October 12, 2021
Error in Yolov4 engine conversion, TAO Toolkit	43	2372	October 26, 2021
Yolov3 to TensorRT - Segmentation fault on inference TensorRT	11	3399	October 12, 2021

Run yolov3_tiny.engine from python

Related topics