Run yolov3_tiny.engine from python

• Hardware Platform (Jetson / GPU) AGX Xavier
• DeepStream Version 5.0
• JetPack Version (valid for Jetson only) 4.4
• TensorRT Version 7.1.3.0

Hi!

I converted darknet yolov3-tiny to trt engine using libnvdsinfer_custom_impl_Yolo.so.
Now I want to run this engine using the python api:

with open("yolov3-tiny.engine", "rb") as f, trt.Runtime(TRT_LOGGER) as runtime:
        engine = runtime.deserialize_cuda_engine(f.read())
        context = engine.create_execution_context()

And I am getting the error:

[TensorRT] ERROR: INVALID_ARGUMENT: getPluginCreator could not find plugin YoloLayer_TRT version 1
[TensorRT] ERROR: safeDeserializationUtils.cpp (322) - Serialization Error in load: 0 (Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry)
[TensorRT] ERROR: INVALID_STATE: std::exception
[TensorRT] ERROR: INVALID_CONFIG: Deserialize the cuda engine failed.
engine
Traceback (most recent call last):
File “trt_yolo.py”, line 65, in
inference = TrtInference()
File “trt_yolo.py”, line 19, in init
self.context = engine.create_execution_context()
AttributeError: ‘NoneType’ object has no attribute ‘create_execution_context’

can I run this engine using python api?

Hi,

The implementation of YoloLayer_TRT plugin is in the libnvdsinfer_custom_impl_Yolo.so.
So please link the library to the TensorRT application.

For example:

/usr/src/tensorrt/bin/trtexec --loadEngine=model_b1_gpu0_fp32.engine --plugins=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so

For python interface, the .so file can be linked via ctypes module.
You can find an example in the below file:

/usr/src/tensorrt/samples/python/uff_custom_plugin/sample.py

Thanks.

Hi @AastaLLL, thanks for reply!

I added ctypes.CDLL("./libnvdsinfer_custom_impl_Yolo.so") in my script and YoloLayer_TRT was found.
But now I get another error:

Deserialize yoloLayerV3 plugin: yolo_17
Deserialize yoloLayerV3 plugin: yolo_24
[TensorRT] ERROR: engine.cpp (986) - Cuda Error in executeInternal: 700 (an illegal memory access was encountered)
[TensorRT] ERROR: FAILED_EXECUTION: std::exception
Traceback (most recent call last):
File “trt_yolo.py”, line 56, in
inference.inference(image_)
File “trt_yolo.py”, line 41, in inference
cuda.memcpy_dtoh(self.out_cpu, self.out_gpu)
pycuda._driver.LogicError: cuMemcpyDtoH failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFreeHost failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFreeHost failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered

I use this script with other network engine file and it works.

Hi,

The error looks like your model cannot be inferenced with TensorRT.
Could you try if trtexec works or not first?

/usr/src/tensorrt/bin/trtexec --loadEngine=[your/engine/file] --plugins=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so

Thanks.

I got

&&&& RUNNING TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --loadEngine=yolo.engine → plugins=/home/nvidia/yoloToTrt/libnvdsinfer_custom_impl_Yolo.so
[11/05/2020-11:13:51] [I] === Model Options ===
[11/05/2020-11:13:51] [I] Format: *
[11/05/2020-11:13:51] [I] Model:
[11/05/2020-11:13:51] [I] Output:
[11/05/2020-11:13:51] [I] === Build Options ===
[11/05/2020-11:13:51] [I] Max batch: 1
[11/05/2020-11:13:51] [I] Workspace: 16 MB
[11/05/2020-11:13:51] [I] minTiming: 1
[11/05/2020-11:13:51] [I] avgTiming: 8
[11/05/2020-11:13:51] [I] Precision: FP32
[11/05/2020-11:13:51] [I] Calibration:
[11/05/2020-11:13:51] [I] Safe mode: Disabled
[11/05/2020-11:13:51] [I] Save engine:
[11/05/2020-11:13:51] [I] Load engine: yolo.engine
[11/05/2020-11:13:51] [I] Builder Cache: Enabled
[11/05/2020-11:13:51] [I] NVTX verbosity: 0
[11/05/2020-11:13:51] [I] Inputs format: fp32:CHW
[11/05/2020-11:13:51] [I] Outputs format: fp32:CHW
[11/05/2020-11:13:51] [I] Input build shapes: model
[11/05/2020-11:13:51] [I] Input calibration shapes: model
[11/05/2020-11:13:51] [I] === System Options ===
[11/05/2020-11:13:51] [I] Device: 0
[11/05/2020-11:13:51] [I] DLACore:
[11/05/2020-11:13:51] [I] Plugins: /home/nvidia/yoloToTrt/libnvdsinfer_custom_impl_Yolo.so
[11/05/2020-11:13:51] [I] === Inference Options ===
[11/05/2020-11:13:51] [I] Batch: 1
[11/05/2020-11:13:51] [I] Input inference shapes: model
[11/05/2020-11:13:51] [I] Iterations: 10
[11/05/2020-11:13:51] [I] Duration: 3s (+ 200ms warm up)
[11/05/2020-11:13:51] [I] Sleep time: 0ms
[11/05/2020-11:13:51] [I] Streams: 1
[11/05/2020-11:13:51] [I] ExposeDMA: Disabled
[11/05/2020-11:13:51] [I] Spin-wait: Disabled
[11/05/2020-11:13:51] [I] Multithreading: Disabled
[11/05/2020-11:13:51] [I] CUDA Graph: Disabled
[11/05/2020-11:13:51] [I] Skip inference: Disabled
[11/05/2020-11:13:51] [I] Inputs:
[11/05/2020-11:13:51] [I] === Reporting Options ===
[11/05/2020-11:13:51] [I] Verbose: Disabled
[11/05/2020-11:13:51] [I] Averages: 10 inferences
[11/05/2020-11:13:51] [I] Percentile: 99
[11/05/2020-11:13:51] [I] Dump output: Disabled
[11/05/2020-11:13:51] [I] Profile: Disabled
[11/05/2020-11:13:51] [I] Export timing to JSON file:
[11/05/2020-11:13:51] [I] Export output to JSON file:
[11/05/2020-11:13:51] [I] Export profile to JSON file:
[11/05/2020-11:13:51] [I]
[11/05/2020-11:13:51] [I] Loading supplied plugin library: /home/nvidia/yoloToTrt/libnvdsinfer_custom_impl_Yolo.so
Deserialize yoloLayerV3 plugin: yolo_17
Deserialize yoloLayerV3 plugin: yolo_24
[11/05/2020-11:13:54] [I] Starting inference threads
[11/05/2020-11:13:58] [I] Warmup completed 5 queries over 200 ms
[11/05/2020-11:13:58] [I] Timing trace has 70 queries over 3.09888 s
[11/05/2020-11:13:58] [I] Trace averages of 10 runs:
[11/05/2020-11:13:58] [I] Average on 10 runs - GPU latency: 43.2709 ms - Host latency: 43.8051 ms (end to end 43.818 ms, enqueue 1.22207 ms)
[11/05/2020-11:13:58] [I] Average on 10 runs - GPU latency: 43.2599 ms - Host latency: 43.7939 ms (end to end 43.8081 ms, enqueue 1.08314 ms)
[11/05/2020-11:13:58] [I] Average on 10 runs - GPU latency: 44.3734 ms - Host latency: 44.9074 ms (end to end 44.9209 ms, enqueue 0.945178 ms)
[11/05/2020-11:13:58] [I] Average on 10 runs - GPU latency: 44.0091 ms - Host latency: 44.5429 ms (end to end 44.5571 ms, enqueue 0.881604 ms)
[11/05/2020-11:13:58] [I] Average on 10 runs - GPU latency: 44.0428 ms - Host latency: 44.5768 ms (end to end 44.592 ms, enqueue 0.949219 ms)
[11/05/2020-11:13:58] [I] Average on 10 runs - GPU latency: 43.6259 ms - Host latency: 44.1593 ms (end to end 44.1721 ms, enqueue 0.897559 ms)
[11/05/2020-11:13:58] [I] Average on 10 runs - GPU latency: 43.4646 ms - Host latency: 44.0046 ms (end to end 44.018 ms, enqueue 0.837988 ms)
[11/05/2020-11:13:58] [I] Host Latency
[11/05/2020-11:13:58] [I] min: 41.7312 ms (end to end 41.7419 ms)
[11/05/2020-11:13:58] [I] max: 45.7786 ms (end to end 45.7908 ms)
[11/05/2020-11:13:58] [I] mean: 44.2557 ms (end to end 44.2695 ms)
[11/05/2020-11:13:58] [I] median: 45.5899 ms (end to end 45.6002 ms)
[11/05/2020-11:13:58] [I] percentile: 45.7786 ms at 99% (end to end 45.7908 ms at 99%)
[11/05/2020-11:13:58] [I] throughput: 22.5888 qps
[11/05/2020-11:13:58] [I] walltime: 3.09888 s
[11/05/2020-11:13:58] [I] Enqueue Time
[11/05/2020-11:13:58] [I] min: 0.749268 ms
[11/05/2020-11:13:58] [I] max: 1.48499 ms
[11/05/2020-11:13:58] [I] median: 0.946655 ms
[11/05/2020-11:13:58] [I] GPU Compute
[11/05/2020-11:13:58] [I] min: 41.1997 ms
[11/05/2020-11:13:58] [I] max: 45.2434 ms
[11/05/2020-11:13:58] [I] mean: 43.721 ms
[11/05/2020-11:13:58] [I] median: 45.0555 ms
[11/05/2020-11:13:58] [I] percentile: 45.2434 ms at 99%
[11/05/2020-11:13:58] [I] total compute time: 3.06047 s
&&&& PASSED TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --loadEngine=yolo.engine --plugins=/home/nvidia/yoloToTrt/libnvdsinfer_custom_impl_Yolo.so

I’m serializing the model with

#include "yolo.h"
#include "NvInferRuntimeCommon.h"
#include <iostream>
#include <fstream>

class Logger : public nvinfer1::ILogger           
 {
     void log(Severity severity, const char* msg) override
     {
         // suppress info-level messages
         if (severity != Severity::kINFO)
             std::cout << msg << std::endl;
     }
 } gLogger;

int main(){
    NetworkInfo info;
    info.networkType = "yolov3-tiny";
    info.configFilePath = "tiny/yolov3-tiny.cfg";
    info.wtsFilePath = "tiny/yolov3-tiny.weights";
    info.deviceType = "kGPU";
    info.inputBlobName = "yolov3-tiny";

    nvinfer1::IBuilder* builder = nvinfer1::createInferBuilder(gLogger);
    builder->setMaxBatchSize(1);

    nvinfer1::IBuilderConfig* config = builder->createBuilderConfig();    
    config->setMaxWorkspaceSize(1 << 30);
    config->setFlag(nvinfer1::BuilderFlag::kFP16);

    Yolo yolo(info);

    nvinfer1::ICudaEngine* engine = yolo.createEngine(builder);

    nvinfer1::IHostMemory *serializedModel = engine->serialize();
    engine->destroy();
    builder->destroy();

    std::ofstream ofs("yolo.engine", std::ios::out | std::ios::binary);
    ofs.write((char*)(serializedModel->data()), serializedModel->size());
    ofs.close();

    serializedModel->destroy();

    return 0;
}

I got model_b1_gpu0_fp16.engine using deepstream-app -c ./deepstream_app_config_yoloV3_tiny.txt.
I’m trying to run my python app using this engine file, but I get the same error again:

Deserialize yoloLayerV3 plugin: yolo_17
Deserialize yoloLayerV3 plugin: yolo_24
[TensorRT] ERROR: engine.cpp (986) - Cuda Error in executeInternal: 700 (an illegal memory access was encountered)
[TensorRT] ERROR: FAILED_EXECUTION: std::exception
Traceback (most recent call last):
File “trt_yolo.py”, line 56, in
inference.inference(image_)
File “trt_yolo.py”, line 41, in inference
cuda.memcpy_dtoh(self.out_cpu, self.out_gpu)
pycuda._driver.LogicError: cuMemcpyDtoH failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFreeHost failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFreeHost failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered

Hi,

Thanks for your testing.
We are checking this and share more information later.

Hi,

We can run the YOLO engine with below script without issue.
Could you give it a try?

import ctypes
import pycuda.autoinit
import pycuda.driver as cuda
import tensorrt as trt
import numpy as np

ctypes.CDLL("/opt/nvidia/deepstream/deepstream-5.0/sources/objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so")
TRT_LOGGER = trt.Logger(trt.Logger.VERBOSE)

with open("model_b1_gpu0_fp32.engine", "rb") as f, trt.Runtime(TRT_LOGGER) as runtime:
    engine = runtime.deserialize_cuda_engine(f.read())
    context = engine.create_execution_context()

    bindings = []
    for binding in engine:
        size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size
        bindings.append(int(cuda.mem_alloc(4*size)))

    stream = cuda.Stream()
    context = engine.create_execution_context()
    context.execute_async(bindings=bindings, stream_handle=stream.handle)
    stream.synchronize()

Thanks.

Hi, thanks for reply!
I allocated memory only for two bindings, but need for three. Now all work.
I get outputs with shapes (6498,) and (25992,)
Can I use libnvdsinfer_custom_impl_Yolo.so for parsing outputs and get bbox?

[TensorRT] WARNING: Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
conv1/convolution + activation_1/Relu6: 1.89542ms
block_1a_conv_1/convolution + activation_2/Relu6: 1.45514ms
block_1a_conv_2/convolution: 2.88618ms
block_1a_conv_shortcut/convolution + add_1/add + activation_3/Relu6: 0.554784ms
block_1b_conv_1/convolution + activation_4/Relu6: 3.05424ms
block_1b_conv_2/convolution: 2.25149ms
block_1b_conv_shortcut/convolution + add_2/add + activation_5/Relu6: 0.61792ms
block_2a_conv_1/convolution + activation_6/Relu6: 1.5256ms
block_2a_conv_2/convolution: 2.2432ms
block_2a_conv_shortcut/convolution + add_3/add + activation_7/Relu6: 0.336288ms
block_2b_conv_1/convolution + activation_8/Relu6: 1103.6ms
block_2b_conv_2/convolution: 2.32346ms
block_2b_conv_shortcut/convolution + add_4/add + activation_9/Relu6: 0.447488ms
block_3a_conv_1/convolution + activation_10/Relu6: 1.10899ms
block_3a_conv_2/convolution: 1.46125ms
block_3a_conv_shortcut/convolution + add_5/add + activation_11/Relu6: 0.238592ms
block_3b_conv_1/convolution + activation_12/Relu6: 2.00397ms
block_3b_conv_2/convolution: 1.71011ms
block_3b_conv_shortcut/convolution + add_6/add + activation_13/Relu6: 0.290784ms
block_4a_conv_1/convolution + activation_14/Relu6: 2.27453ms
block_4a_conv_2/convolution: 2.06624ms
block_4a_conv_shortcut/convolution + add_7/add + activation_15/Relu6: 0.308384ms
block_4b_conv_1/convolution + activation_16/Relu6: 1.16208ms
block_4b_conv_2/convolution: 0.805856ms
block_4b_conv_shortcut/convolution + add_8/add + activation_17/Relu6: 0.17408ms
output_cov/convolution: 0.058688ms
[TensorRT] ERROR: engine.cpp (725) - Cuda Error in reportTimes: 700 (an illegal memory access was encountered)
[TensorRT] ERROR: INTERNAL_ERROR: std::exception
[TensorRT] ERROR: engine.cpp (986) - Cuda Error in executeInternal: 700 (an illegal memory access was encountered)
[TensorRT] ERROR: FAILED_EXECUTION: std::exception
[TensorRT] ERROR: engine.cpp (179) - Cuda Error in ~ExecutionContext: 700 (an illegal memory access was encountered)
[TensorRT] ERROR: INTERNAL_ERROR: std::exception
[TensorRT] ERROR: Parameter check failed at: …/rtSafe/safeContext.cpp::terminateCommonContext::155, condition: cudnnDestroy(context.cudnn) failure.
[TensorRT] ERROR: Parameter check failed at: …/rtSafe/safeContext.cpp::terminateCommonContext::165, condition: cudaEventDestroy(context.start) failure.
[TensorRT] ERROR: Parameter check failed at: …/rtSafe/safeContext.cpp::terminateCommonContext::170, condition: cudaEventDestroy(context.stop) failure.
[TensorRT] ERROR: …/rtSafe/safeRuntime.cpp (32) - Cuda Error in free: 700 (an illegal memory access was encountered)
terminate called after throwing an instance of ‘nvinfer1::CudaError’
what(): std::exception
Aborted (core dumped)
@AastaLLL @kayccc
I am getting similar kind of when trying to run pre-trained tlt models after converting them to trt and running using Pycude script on Jetson NX. I was using PeopleNet (Resnet18).
Foolowed Pycuda script from here: Speeding Up Deep Learning Inference Using TensorFlow, ONNX, and NVIDIA TensorRT | NVIDIA Technical Blog

It looks like you have created an engine file on one machine but want to run on another.

@nazarov-alexey
No, I created engine over jetson and trying to run on the same

Did You try to run

/usr/src/tensorrt/bin/trtexec --loadEngine=your.engine

@nazarov-alexey Yes, that was running perfectly.

And if You run this, you get the error?