Hi,
We trained a custom resnet18 SSD network with TLT and then exported it to etlt format.
Using deepStream SDK on TX2 we were successfully able to use this network and to get TRT engine file.
We are able to use the engine file successfully when using DeepStream SDK.
But, when we trying to use it directly on TRT python SDK we got error.
The deepstream’s nvinfer plugin for reference are:
[property]
gpu-id=0
net-scale-factor=1.0
offsets=103.939;116.779;123.68
model-color-format=1
labelfile-path=labels.txt
model-engine-file=/home/deep/ssd_resnet18_epoch_030_fp32.etlt_b1_fp32.engine
input-dims=3;306;544;0 # where c = number of channels, h = height of the model input, w = width of model input, 0: implies CHW format.
uff-input-blob-name=Input
batch-size=1
network-mode=0
num-detected-classes=1
interval=0
gie-unique-id=1
output-blob-names=NMS
parse-bbox-func-name=NvDsInferParseCustomSSDUff
custom-lib-path=</path/to/libnvds_infercustomparser_ssd_uff.so>
[class-attrs-all]
threshold=0.3
eps=0.2
group-threshold=0
roi-top-offset=0
roi-bottom-offset=0
detected-min-w=0
detected-min-h=0
detected-max-w=0
detected-max-h=0
The TRT code we are using is:
import numpy as np
import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit
TRT_LOGGER = trt.Logger(trt.Logger.INFO)
trt.init_libnvinfer_plugins(TRT_LOGGER, '')
engine_file = 'ssd_resnet18_epoch_030_fp32.etlt_b1_fp32.engine'
with open(engine_file, 'rb') as f, trt.Runtime(TRT_LOGGER) as runtime:
engine = runtime.deserialize_cuda_engine(f.read())
h_input = cuda.pagelocked_zeros( trt.volume(engine.get_binding_shape(0)), dtype=np.float32)
h_output = cuda.pagelocked_empty(trt.volume(engine.get_binding_shape(1)), dtype=np.float32)
d_input = cuda.mem_alloc(h_input.nbytes)
d_output = cuda.mem_alloc(h_output.nbytes)
stream = cuda.Stream()
with engine.create_execution_context() as context:
#cuda.memcpy_htod_async(d_input, h_input, stream)
cuda.memcpy_htod(d_input, h_input)
#context.execute_async(bindings=[int(d_input), int(d_output)], stream_handle=stream.handle)
context.execute(bindings=[int(d_input), int(d_output)])
print ('execution complete')
# Transfer predictions back from the GPU.
cuda.memcpy_dtoh_async(h_output, d_output, stream)
stream.synchronize()
print ('h_output',h_output.shape)
The error we get is :
#assertion/home/deep/TensorRT/plugin/nmsPlugin/nmsPlugin.cpp,118
Aborted (core dumped)
No other warnings are seen.
TensorRT 6.0.1.10 version
Do you know what can be the issue?
Do you know how to get more detailed information about the problem from the logger?