• Hardware Platform (Jetson / GPU) AGX Xavier
• DeepStream Version 5.0
• JetPack Version (valid for Jetson only) 4.4
• TensorRT Version 7.1.3.0
Hi!
I converted darknet yolov3-tiny to trt engine using libnvdsinfer_custom_impl_Yolo.so.
Now I want to run this engine using the python api:
with open("yolov3-tiny.engine", "rb") as f, trt.Runtime(TRT_LOGGER) as runtime:
engine = runtime.deserialize_cuda_engine(f.read())
context = engine.create_execution_context()
And I am getting the error:
[TensorRT] ERROR: INVALID_ARGUMENT: getPluginCreator could not find plugin YoloLayer_TRT version 1
[TensorRT] ERROR: safeDeserializationUtils.cpp (322) - Serialization Error in load: 0 (Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry)
[TensorRT] ERROR: INVALID_STATE: std::exception
[TensorRT] ERROR: INVALID_CONFIG: Deserialize the cuda engine failed.
engine
Traceback (most recent call last):
File “trt_yolo.py”, line 65, in
inference = TrtInference()
File “trt_yolo.py”, line 19, in init
self.context = engine.create_execution_context()
AttributeError: ‘NoneType’ object has no attribute ‘create_execution_context’
can I run this engine using python api?
Hi,
The implementation of YoloLayer_TRT plugin is in the libnvdsinfer_custom_impl_Yolo.so.
So please link the library to the TensorRT application.
For example:
/usr/src/tensorrt/bin/trtexec --loadEngine=model_b1_gpu0_fp32.engine --plugins=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
For python interface, the .so file can be linked via ctypes module.
You can find an example in the below file:
/usr/src/tensorrt/samples/python/uff_custom_plugin/sample.py
Thanks.
Hi @AastaLLL, thanks for reply!
I added ctypes.CDLL("./libnvdsinfer_custom_impl_Yolo.so")
in my script and YoloLayer_TRT was found.
But now I get another error:
Deserialize yoloLayerV3 plugin: yolo_17
Deserialize yoloLayerV3 plugin: yolo_24
[TensorRT] ERROR: engine.cpp (986) - Cuda Error in executeInternal: 700 (an illegal memory access was encountered)
[TensorRT] ERROR: FAILED_EXECUTION: std::exception
Traceback (most recent call last):
File “trt_yolo.py”, line 56, in
inference.inference(image_)
File “trt_yolo.py”, line 41, in inference
cuda.memcpy_dtoh(self.out_cpu, self.out_gpu)
pycuda._driver.LogicError: cuMemcpyDtoH failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFreeHost failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFreeHost failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
I use this script with other network engine file and it works.
Hi,
The error looks like your model cannot be inferenced with TensorRT.
Could you try if trtexec works or not first?
/usr/src/tensorrt/bin/trtexec --loadEngine=[your/engine/file] --plugins=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
Thanks.
I’m serializing the model with
#include "yolo.h"
#include "NvInferRuntimeCommon.h"
#include <iostream>
#include <fstream>
class Logger : public nvinfer1::ILogger
{
void log(Severity severity, const char* msg) override
{
// suppress info-level messages
if (severity != Severity::kINFO)
std::cout << msg << std::endl;
}
} gLogger;
int main(){
NetworkInfo info;
info.networkType = "yolov3-tiny";
info.configFilePath = "tiny/yolov3-tiny.cfg";
info.wtsFilePath = "tiny/yolov3-tiny.weights";
info.deviceType = "kGPU";
info.inputBlobName = "yolov3-tiny";
nvinfer1::IBuilder* builder = nvinfer1::createInferBuilder(gLogger);
builder->setMaxBatchSize(1);
nvinfer1::IBuilderConfig* config = builder->createBuilderConfig();
config->setMaxWorkspaceSize(1 << 30);
config->setFlag(nvinfer1::BuilderFlag::kFP16);
Yolo yolo(info);
nvinfer1::ICudaEngine* engine = yolo.createEngine(builder);
nvinfer1::IHostMemory *serializedModel = engine->serialize();
engine->destroy();
builder->destroy();
std::ofstream ofs("yolo.engine", std::ios::out | std::ios::binary);
ofs.write((char*)(serializedModel->data()), serializedModel->size());
ofs.close();
serializedModel->destroy();
return 0;
}
I got model_b1_gpu0_fp16.engine using deepstream-app -c ./deepstream_app_config_yoloV3_tiny.txt
.
I’m trying to run my python app using this engine file, but I get the same error again:
Deserialize yoloLayerV3 plugin: yolo_17
Deserialize yoloLayerV3 plugin: yolo_24
[TensorRT] ERROR: engine.cpp (986) - Cuda Error in executeInternal: 700 (an illegal memory access was encountered)
[TensorRT] ERROR: FAILED_EXECUTION: std::exception
Traceback (most recent call last):
File “trt_yolo.py”, line 56, in
inference.inference(image_)
File “trt_yolo.py”, line 41, in inference
cuda.memcpy_dtoh(self.out_cpu, self.out_gpu)
pycuda._driver.LogicError: cuMemcpyDtoH failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFreeHost failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFreeHost failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
Hi,
Thanks for your testing.
We are checking this and share more information later.
Hi,
We can run the YOLO engine with below script without issue.
Could you give it a try?
import ctypes
import pycuda.autoinit
import pycuda.driver as cuda
import tensorrt as trt
import numpy as np
ctypes.CDLL("/opt/nvidia/deepstream/deepstream-5.0/sources/objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so")
TRT_LOGGER = trt.Logger(trt.Logger.VERBOSE)
with open("model_b1_gpu0_fp32.engine", "rb") as f, trt.Runtime(TRT_LOGGER) as runtime:
engine = runtime.deserialize_cuda_engine(f.read())
context = engine.create_execution_context()
bindings = []
for binding in engine:
size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size
bindings.append(int(cuda.mem_alloc(4*size)))
stream = cuda.Stream()
context = engine.create_execution_context()
context.execute_async(bindings=bindings, stream_handle=stream.handle)
stream.synchronize()
Thanks.
Hi, thanks for reply!
I allocated memory only for two bindings, but need for three. Now all work.
I get outputs with shapes (6498,) and (25992,)
Can I use libnvdsinfer_custom_impl_Yolo.so for parsing outputs and get bbox?
It looks like you have created an engine file on one machine but want to run on another.
@nazarov-alexey
No, I created engine over jetson and trying to run on the same
@nazarov-alexey Yes, that was running perfectly.
And if You run this, you get the error?