Migrated from DeepStream 4 to Deepstream 5 and got errors

Hello, any help? Please.

Hello. Have set rtsp cameras quantity to 4, also batch-size of all CNN were up to 4, so right now ONNX model, even with force-implicit-batch-dim=0 are not compiling into TRT engine. Same error.

Hi,

Sorry for the late update.

May I know which model format do you use first? Is it caffemodel or onnx?
It’s recommended to run you model with trtexec binary to see if the issue comes from TensorRT first.

/usr/src/tensorrt/bin/trtexec --onnx=[your/model]

Thanks.

Hello, AastaLLL.
I do use resnet50-caffe2-v1-9.onnx. Thanks for advise, will try it.

Hello. Have tested like you have written. Also got from another thread --saveEngine function.

[08/28/2020-18:21:23] [I] === Model Options ===
[08/28/2020-18:21:23] [I] Format: ONNX
[08/28/2020-18:21:23] [I] Model: /home/bigbrother/Desktop/Work/Release/models/Secondary_Spec_Trans/resnet50-caffe2-v1-9.onnx
[08/28/2020-18:21:23] [I] Output:
[08/28/2020-18:21:23] [I] === Build Options ===
[08/28/2020-18:21:23] [I] Max batch: 4
[08/28/2020-18:21:23] [I] Workspace: 16 MB
[08/28/2020-18:21:23] [I] minTiming: 1
[08/28/2020-18:21:23] [I] avgTiming: 8
[08/28/2020-18:21:23] [I] Precision: FP32
[08/28/2020-18:21:23] [I] Calibration:
[08/28/2020-18:21:23] [I] Safe mode: Disabled
[08/28/2020-18:21:23] [I] Save engine: /home/bigbrother/Desktop/Work/Release/models/Secondary_Spec_Trans/resnet50-caffe2-v1-9.engine
[08/28/2020-18:21:23] [I] Load engine:
[08/28/2020-18:21:23] [I] Builder Cache: Enabled
[08/28/2020-18:21:23] [I] NVTX verbosity: 0
[08/28/2020-18:21:23] [I] Inputs format: fp32:CHW
[08/28/2020-18:21:23] [I] Outputs format: fp32:CHW
[08/28/2020-18:21:23] [I] Input build shapes: model
[08/28/2020-18:21:23] [I] Input calibration shapes: model
[08/28/2020-18:21:23] [I] === System Options ===
[08/28/2020-18:21:23] [I] Device: 0
[08/28/2020-18:21:23] [I] DLACore:
[08/28/2020-18:21:23] [I] Plugins:
[08/28/2020-18:21:23] [I] === Inference Options ===
[08/28/2020-18:21:23] [I] Batch: 4
[08/28/2020-18:21:23] [I] Input inference shapes: model
[08/28/2020-18:21:23] [I] Iterations: 10
[08/28/2020-18:21:23] [I] Duration: 3s (+ 200ms warm up)
[08/28/2020-18:21:23] [I] Sleep time: 0ms
[08/28/2020-18:21:23] [I] Streams: 1
[08/28/2020-18:21:23] [I] ExposeDMA: Disabled
[08/28/2020-18:21:23] [I] Spin-wait: Disabled
[08/28/2020-18:21:23] [I] Multithreading: Disabled
[08/28/2020-18:21:23] [I] CUDA Graph: Disabled
[08/28/2020-18:21:23] [I] Skip inference: Disabled
[08/28/2020-18:21:23] [I] Inputs:
[08/28/2020-18:21:23] [I] === Reporting Options ===
[08/28/2020-18:21:23] [I] Verbose: Disabled
[08/28/2020-18:21:23] [I] Averages: 10 inferences
[08/28/2020-18:21:23] [I] Percentile: 99
[08/28/2020-18:21:23] [I] Dump output: Disabled
[08/28/2020-18:21:23] [I] Profile: Disabled
[08/28/2020-18:21:23] [I] Export timing to JSON file:
[08/28/2020-18:21:23] [I] Export output to JSON file:
[08/28/2020-18:21:23] [I] Export profile to JSON file:
[08/28/2020-18:21:23] [I]

Input filename: /home/bigbrother/Desktop/Work/Release/models/Secondary_Spec_Trans/resnet50-caffe2-v1-9.onnx
ONNX IR version: 0.0.3
Opset version: 9
Producer name: onnx-caffe2
Producer version:
Domain:
Model version: 0
Doc string:

[08/28/2020-18:21:26] [W] [TRT] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[08/28/2020-18:21:37] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[08/28/2020-18:21:58] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[08/28/2020-18:22:02] [I] Starting inference threads
[08/28/2020-18:22:05] [I] Warmup completed 4 queries over 200 ms
[08/28/2020-18:22:05] [I] Timing trace has 156 queries over 3.17335 s
[08/28/2020-18:22:05] [I] Trace averages of 10 runs:
[08/28/2020-18:22:05] [I] Average on 10 runs - GPU latency: 81.1181 ms - Host latency: 81.4097 ms (end to end 81.4248 ms, enqueue 7.55378 ms)
[08/28/2020-18:22:05] [I] Average on 10 runs - GPU latency: 81.0408 ms - Host latency: 81.3324 ms (end to end 81.3476 ms, enqueue 8.09339 ms)
[08/28/2020-18:22:05] [I] Average on 10 runs - GPU latency: 80.8898 ms - Host latency: 81.1816 ms (end to end 81.1971 ms, enqueue 7.97445 ms)
[08/28/2020-18:22:05] [I] Host Latency
[08/28/2020-18:22:05] [I] min: 80.325 ms (end to end 80.3409 ms)
[08/28/2020-18:22:05] [I] max: 83.3179 ms (end to end 83.3274 ms)
[08/28/2020-18:22:05] [I] mean: 81.352 ms (end to end 81.3671 ms)
[08/28/2020-18:22:05] [I] median: 81.3025 ms (end to end 81.3181 ms)
[08/28/2020-18:22:05] [I] percentile: 83.3179 ms at 99% (end to end 83.3274 ms at 99%)
[08/28/2020-18:22:05] [I] throughput: 49.1594 qps
[08/28/2020-18:22:05] [I] walltime: 3.17335 s
[08/28/2020-18:22:05] [I] Enqueue Time
[08/28/2020-18:22:05] [I] min: 3.09851 ms
[08/28/2020-18:22:05] [I] max: 13.0354 ms
[08/28/2020-18:22:05] [I] median: 7.99149 ms
[08/28/2020-18:22:05] [I] GPU Compute
[08/28/2020-18:22:05] [I] min: 80.0336 ms
[08/28/2020-18:22:05] [I] max: 83.0281 ms
[08/28/2020-18:22:05] [I] mean: 81.0604 ms
[08/28/2020-18:22:05] [I] median: 81.0103 ms
[08/28/2020-18:22:05] [I] percentile: 83.0281 ms at 99%
[08/28/2020-18:22:05] [I] total compute time: 3.16136 s
&&&& PASSED TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --onnx=/home/bigbrother/Desktop/Work/Release/models/Secondary_Spec_Trans/resnet50-caffe2-v1-9.onnx --saveEngine=/home/bigbrother/Desktop/Work/Release/models/Secondary_Spec_Trans/resnet50-caffe2-v1-9.engine --batch=4

TensorRT got passed over resnet50 ONNX, with batchsize of 4.

But, when i try to compile engine file with deepstream - it is fails. Just like in this message Preprocessing between pgie and sgie

But even if i set --maxBatch=4 and compile engine i got this:
0:00:11.282954490 8398 0x55999dfb30 WARN nvinfer gstnvinfer.cpp:616:gst_nvinfer_logger: NvDsInferContext[UID 2]: Warning from NvDsInferContextImpl::checkBackendParams() <nvdsinfer_context_impl.cpp:1642> [UID = 2]: Backend has maxBatchSize 1 whereas 4 has been requested

And this

ERROR: [TRT]: …/builder/cudnnBuilderBlockChooser.cpp (117) - Assertion Error in buildMemGraph: 0 (mg.nodes[mg.regionIndices[outputRegion]].size == mg.nodes[mg.regionIndices[inputRegion]].size)
ERROR: Build engine failed from config file
ERROR: failed to build trt engine.

Hello.
Small update: i removed batch check, set all to 1 and it does it’s job.
But still falls down with same error.

Hello, again :-)

I got culprit of my error, it is image getting from NvBuf. When i remove it - program works until i shut it down. So i got recoded using of it, basically make OpenCV to release all buffer images, used for conversion from NvBuf to OCR code.

Here it is:
static GstFlowReturn
Convert_Buf_Into_Image (NvBufSurface input_buf, gint idx,
NvOSD_RectParams * crop_rect_params, gdouble Make_Ratio, gint input_width,
gint input_height, char
filename_Adder)//NvDsObjectMeta *object_meta)
{
NvBufSurfTransform_Error err;
NvBufSurfTransformConfigParams transform_config_params;
NvBufSurfTransformParams transform_params;
NvBufSurfTransformRect src_rect;
NvBufSurfTransformRect dst_rect;
NvBufSurface ip_surf;
cv::Mat in_mat, buf_mat;
ip_surf = *input_buf;

ip_surf.numFilled = ip_surf.batchSize = 1;
ip_surf.surfaceList = &(input_buf->surfaceList[idx]);

/gint src_left = GST_ROUND_UP_2(crop_rect_params->left);
gint src_top = GST_ROUND_UP_2(crop_rect_params->top);
gint src_width = GST_ROUND_DOWN_2(crop_rect_params->width);
gint src_height = GST_ROUND_DOWN_2(crop_rect_params->height);
/

gint src_left = (int)crop_rect_params->left;
gint src_top = (int)crop_rect_params->top;
gint src_width = (int)crop_rect_params->width;
gint src_height = (int)crop_rect_params->height;
//g_print(“ltwh = %d %d %d %d \n”, src_left, src_top, src_width, src_height);

guint dest_width, dest_height;
gdouble Ratio = 1.0;
dest_width = src_width;
dest_height = src_height;

NvBufSurface *nvbuf;
NvBufSurfaceCreateParams create_params;
//dsexample->gpu_id = DEFAULT_GPU_ID;

char *p_str;

create_params.gpuId = DEFAULT_GPU_ID;
create_params.width = dest_width;
create_params.height = dest_height;
create_params.size = 0;
create_params.colorFormat = NVBUF_COLOR_FORMAT_RGBA;
create_params.layout = NVBUF_LAYOUT_PITCH;
#ifdef aarch64
create_params.memType = NVBUF_MEM_DEFAULT;
#else
create_params.memType = NVBUF_MEM_CUDA_UNIFIED;
#endif
NvBufSurfaceCreate (&nvbuf, 1, &create_params);

// Configure transform session parameters for the transformation
transform_config_params.compute_mode = NvBufSurfTransformCompute_Default;
transform_config_params.gpu_id = DEFAULT_GPU_ID;

// Set the transform session parameters for the conversions executed in this
// thread.
err = NvBufSurfTransformSetSessionParams (&transform_config_params);
if (err != NvBufSurfTransformError_Success)
{
goto error;
}

// Calculate scaling ratio while maintaining aspect ratio
//Ratio = MIN (Make_Ratio * dest_width / src_width, Make_Ratio * dest_height / src_height);
Ratio = MIN (Make_Ratio * dest_width / src_width, Make_Ratio * dest_height / src_height);

if ((crop_rect_params->width == 0) || (crop_rect_params->height == 0))
{
goto error;
}

#ifdef aarch64
if (Ratio <= 1.0 / 16 || Ratio >= 16.0) {
// Currently cannot scale by ratio > 16 or < 1/16 for Jetson
goto error;
}
#endif
// Set the transform ROIs for source and destination
src_rect = {(guint)src_top, (guint)src_left, (guint)src_width, (guint)src_height};
dst_rect = {0, 0, (guint)dest_width, (guint)dest_height};

// Set the transform parameters
transform_params.src_rect = &src_rect;
transform_params.dst_rect = &dst_rect;
transform_params.transform_flag =
NVBUFSURF_TRANSFORM_FILTER | NVBUFSURF_TRANSFORM_CROP_SRC |
NVBUFSURF_TRANSFORM_CROP_DST;
transform_params.transform_filter = NvBufSurfTransformInter_Default;

//Memset the memory
NvBufSurfaceMemSet (nvbuf, 0, 0, 0);

// Transformation scaling+format conversion if any.
err = NvBufSurfTransform (&ip_surf, nvbuf, &transform_params);
if (err != NvBufSurfTransformError_Success)
{
goto error;
}
// Map the buffer so that it can be accessed by CPU
if (NvBufSurfaceMap (nvbuf, 0, 0, NVBUF_MAP_READ) != 0)
{
goto error;
}

// Cache the mapped data for CPU access
NvBufSurfaceSyncForCpu (nvbuf, 0, 0);

// Use openCV to remove padding and convert RGBA to BGR. Can be skipped if
// algorithm can handle padded RGBA data.
in_mat = cv::Mat (dest_height, dest_width, CV_8UC4, nvbuf->surfaceList[0].mappedAddr.addr[0],
nvbuf->surfaceList[0].pitch);
buf_mat = cv::Mat(cv::Size(dest_width * Make_Ratio, dest_height * Make_Ratio), CV_8UC4);
Detected_Vehicle_Image = cv::Mat(cv::Size(dest_width * Make_Ratio, dest_height * Make_Ratio), CV_8UC3);
cv::cvtColor(in_mat, buf_mat, cv::COLOR_RGBA2BGR);
//cv::resize(in_mat, Detected_Vehicle_Image, cv::Size(), Make_Ratio, Make_Ratio, cv::INTER_LINEAR);
cv::resize(buf_mat, Detected_Vehicle_Image, cv::Size(), Make_Ratio, Make_Ratio, cv::INTER_LINEAR);

char filename[120];
snprintf(filename, 120, “./lastframes/pic_%s.jpg”, filename_Adder);
if (!(File_Exists(filename))) cv::imwrite(filename, Detected_Vehicle_Image);
in_mat.release();
buf_mat.release();

if (NvBufSurfaceUnMap (nvbuf, 0, 0))
{
goto error;
}
NvBufSurfaceDestroy(nvbuf);

return GST_FLOW_OK;

error:
return GST_FLOW_ERROR;
}

But now i am facing more errors! Will make new thread for it.

Hi,

Sorry for the late update and thanks for the detail update.

It seems that the model can run correctly with TensorRT separately.
Would you mind to share a reproducible source with us so we can check it for you?

Thanks.

Hello, thank for reply.
No, i have found MAIN error, right now i need to find better way to save images from NvBuf into OpenCV Mat, or just save images into jpeg/png/bmp files. Yes i have checked deepstream-test5-sample, but adaptating it’s code gives me nothing, but i will try it again.
All this because NvBuf and CPU share same RAM and sometimes there is bug that can overwrite into wrong sectors.
I have pasted my code for saving images, main part of which i got from this forum :-)

Please help with another bug, with NvTracker initial ID value , i have already made new thread.
Thank you,

Hi,

So is the original cudnn error solved?
Just want to confirm it.

Thanks.

Hello.
Original issue, which is ONNX on DP5, partially resolved. Because with batch-size=1 DP5 can make and use engine by itself. But with batch size bigger than 1 - it crashes. And i can make engine with batch-size>1 with TRT, but DP5, when loading it, issue error described in Migrated from DeepStream 4 to Deepstream 5 and got errors
So i can use ONLY batch size of 1 with ONNX models.

Because of all this, problem is partially solved.

Hi,

Have you re-generate the engine file?
It is possible that Deepstream use the exist engine file which created with batchsize=1 and causes this error.

Thanks.

Hello.
Yes, it is re-generated. Problem is in batch-size > 1, batch-size=1 is working OK. And i do re-generate it in DP5.

Best regards.

Hello.
I am sure that there will be solution to DP5 + ONNX with batch size > 1.
You can close this thread.

Best regards.

Hi,

Not sure which solution do you find.
Here is our suggestion for your reference.

Assertion Error in buildMemGraph: 0 (mg.nodes[mg.regionIndices[outputRegion]].size == mg.nodes[mg.regionIndices[inputRegion]].size)

Based on above log, the error occurs from an onnx model doesn’t generate with the correct batchsize.
Since you try to use batchsize=2, the model need to be generated with batchsize 2 or dynamic batchsize.

We can reproduce this error with our /usr/src/tensorrt/data/resnet50/ResNet50.onnx model.
To solve this issue, we re-generate the onnx file for batchsize==2.
This can be achieved via our ONNX GraphSurgeon API:
https://github.com/NVIDIA/TensorRT/tree/master/tools/onnx-graphsurgeon

1. Install

$ https://github.com/NVIDIA/TensorRT.git
$ cd TensorRT/tools/onnx-graphsurgeon/
$ make install

2. Generate your own convert.py.
Here is our sample for resnet50.
In general, we change the input batch, output batch and the reshape operation right before the output layer.

import onnx_graphsurgeon as gs
import onnx

batch = 2

graph = gs.import_onnx(onnx.load("ResNet50.onnx"))
for inp in graph.inputs:
    inp.shape[0] = batch
for out in graph.outputs:
    out.shape[0] = batch

# update reshape from [1, 2048] to [2, 2048]
reshape = [node for node in graph.nodes if node.op == "Reshape"]
reshape[0].inputs[1].values[0] = batch

onnx.save(gs.export_onnx(graph), "ResNet50_dynamic.onnx")
python3 convert.py

3.
The you can replace the onnx model with the dynamic one.
We have confirmed that Deepstream can run the ResNet50_dynamic.onnx without issue in our environment.

Thanks.

Thank you, will try this ASAP.

Hello.
I am stuck at $ make install from 1.
With python cannot find setuptools with setuptools installed.
Please help, four days i am trying :-(

Hi,

Sorry for the late update.

Please try the following to see if helps:

$ sudo apt-get update
$ sudo apt-get install python3-pip
$ sudo pip3 install -U pip testresources setuptools

Thanks.