"cudaMemset2DAsync failed with error cudaErrorInvalidValue while converting buffer" when running YoloV3_tiny on Jetson Nano

traional · March 18, 2021, 10:35am

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU)
Jetson Nano
• DeepStream Version
5.1
• JetPack Version (valid for Jetson only)
4.5.1
• Issue Type( questions, new requirements, bugs)
Bugs
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
Running the Yolov3_tiny file with the “deepstream-app -c deepstream_app_config_yoloV3_tiny.txt” command causes it to load the model, then fail with “nvinfer gstnvinfer.cpp:1111:get_converted_buffer:<primary_gie> cudaMemset2DAsync failed with error cudaErrorInvalidValue while converting buffer”.

Full log is below:

Unknown or legacy key specified 'is-classifier' for group [property]
Warn: 'threshold' parameter has been deprecated. Use 'pre-cluster-threshold' instead.
gstnvtracker: Loading low-level lib at /opt/nvidia/deepstream/deepstream-5.1/lib/libnvds_mot_klt.so
gstnvtracker: Optional NvMOT_RemoveStreams not implemented
gstnvtracker: Batch processing is OFF
gstnvtracker: Past frame output is OFF
0:00:00.293626655  8499     0x31963200 INFO                 nvinfer gstnvinfer.cpp:619:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1716> [UID = 1]: Trying to create engine from model files
Loading pre-trained weights...
Loading weights of yolov3-tiny complete!
Total Number of weights read : 8858734
Loading pre-trained weights...
Loading weights of yolov3-tiny complete!
Total Number of weights read : 8858734
Building Yolo network...
  layer               inp_size            out_size       weightPtr
(0)   conv-bn-leaky     3 x 416 x 416      16 x 416 x 416    496   
(1)   maxpool          16 x 416 x 416      16 x 208 x 208    496   
(2)   conv-bn-leaky    16 x 208 x 208      32 x 208 x 208    5232  
(3)   maxpool          32 x 208 x 208      32 x 104 x 104    5232  
(4)   conv-bn-leaky    32 x 104 x 104      64 x 104 x 104    23920 
(5)   maxpool          64 x 104 x 104      64 x  52 x  52    23920 
(6)   conv-bn-leaky    64 x  52 x  52     128 x  52 x  52    98160 
(7)   maxpool         128 x  52 x  52     128 x  26 x  26    98160 
(8)   conv-bn-leaky   128 x  26 x  26     256 x  26 x  26    394096
(9)   maxpool         256 x  26 x  26     256 x  13 x  13    394096
(10)  conv-bn-leaky   256 x  13 x  13     512 x  13 x  13    1575792
(11)  maxpool         512 x  13 x  13     512 x  13 x  13    1575792
(12)  conv-bn-leaky   512 x  13 x  13    1024 x  13 x  13    6298480
(13)  conv-bn-leaky  1024 x  13 x  13     256 x  13 x  13    6561648
(14)  conv-bn-leaky   256 x  13 x  13     512 x  13 x  13    7743344
(15)  conv-linear     512 x  13 x  13     255 x  13 x  13    7874159
(16)  yolo            255 x  13 x  13     255 x  13 x  13    7874159
(17)  route                  -            256 x  13 x  13    7874159
(18)  conv-bn-leaky   256 x  13 x  13     128 x  13 x  13    7907439
INFO: [TRT]: mm1_19: broadcasting input0 to make tensors conform, dims(input0)=[1,26,13][NONE] dims(input1)=[128,13,13][NONE].
INFO: [TRT]: mm2_19: broadcasting input1 to make tensors conform, dims(input0)=[128,26,13][NONE] dims(input1)=[1,13,26][NONE].
(19)  upsample        128 x  13 x  13     128 x  26 x  26        - 
(20)  route                  -            384 x  26 x  26    7907439
(21)  conv-bn-leaky   384 x  26 x  26     256 x  26 x  26    8793199
(22)  conv-linear     256 x  26 x  26     255 x  26 x  26    8858734
(23)  yolo            255 x  26 x  26     255 x  26 x  26    8858734
Output yolo blob names :
yolo_17
yolo_24
Total number of yolo layers: 49
Building yolo network complete!
Building the TensorRT Engine...
INFO: [TRT]: mm1_19: broadcasting input0 to make tensors conform, dims(input0)=[1,26,13][NONE] dims(input1)=[128,13,13][NONE].
INFO: [TRT]: mm2_19: broadcasting input1 to make tensors conform, dims(input0)=[128,26,13][NONE] dims(input1)=[1,13,26][NONE].
INFO: [TRT]: 
INFO: [TRT]: --------------- Layers running on DLA: 
INFO: [TRT]: 
INFO: [TRT]: --------------- Layers running on GPU: 
INFO: [TRT]: conv_1, leaky_1, maxpool_2, conv_3, leaky_3, maxpool_4, conv_5, leaky_5, maxpool_6, conv_7, leaky_7, maxpool_8, conv_9, leaky_9, maxpool_10, conv_11, leaky_11, maxpool_12, conv_13, leaky_13, conv_14, leaky_14, conv_19, conv_15, postMul_19, leaky_19, preMul_19, mm1_19, mm2_19, (Unnamed Layer* 42) [Matrix Multiply]_output copy, leaky_15, conv_16, yolo_17, conv_22, leaky_22, conv_23, yolo_24, 
INFO: [TRT]: Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
INFO: [TRT]: Detected 1 inputs and 2 output network tensors.
Building complete!
0:01:03.844424604  8499     0x31963200 INFO                 nvinfer gstnvinfer.cpp:619:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1749> [UID = 1]: serialize cuda engine to file: /opt/nvidia/deepstream/deepstream-5.1/sources/objectDetector_Yolo/model_b1_gpu0_fp32.engine successfully
INFO: [Implicit Engine Info]: layers num: 3
0   INPUT  kFLOAT data            3x416x416       
1   OUTPUT kFLOAT yolo_17         255x13x13       
2   OUTPUT kFLOAT yolo_24         255x26x26       

0:01:03.860735227  8499     0x31963200 INFO                 nvinfer gstnvinfer_impl.cpp:313:notifyLoadModelStatus:<primary_gie> [UID 1]: Load new model:/opt/nvidia/deepstream/deepstream-5.1/sources/objectDetector_Yolo/config_infer_primary_yoloV3_tiny.txt sucessfully

Runtime commands:
	h: Print this help
	q: Quit

	p: Pause
	r: Resume

NOTE: To expand a source in the 2D tiled display and view object details, left-click on the source.
  To go back to the tiled display, right-click anywhere on the window.


**PERF:  FPS 0 (Avg)	
**PERF:  0.00 (0.00)	
** INFO: <bus_callback:181>: Pipeline ready

Opening in BLOCKING MODE 
NvMMLiteOpen : Block : BlockType = 261 
NVMEDIA: Reading vendor.tegra.display-size : status: 6 
NvMMLiteBlockCreate : Block : BlockType = 261 
** INFO: <bus_callback:167>: Pipeline running

0:01:06.380942236  8499     0x311b4370 ERROR                nvinfer gstnvinfer.cpp:1111:get_converted_buffer:<primary_gie> cudaMemset2DAsync failed with error cudaErrorInvalidValue while converting buffer
0:01:06.381060732  8499     0x311b4370 WARN                 nvinfer gstnvinfer.cpp:1372:gst_nvinfer_process_full_frame:<primary_gie> error: Buffer conversion failed
ERROR from primary_gie: Buffer conversion failed
Debug info: /dvs/git/dirty/git-master_linux/deepstream/sdk/src/gst-plugins/gst-nvinfer/gstnvinfer.cpp(1372): gst_nvinfer_process_full_frame (): /GstPipeline:pipeline/GstBin:primary_gie_bin/GstNvInfer:primary_gie
ERROR from qtdemux0: Internal data stream error.
Debug info: qtdemux.c(6073): gst_qtdemux_loop (): /GstPipeline:pipeline/GstBin:multi_src_bin/GstBin:src_sub_bin0/GstURIDecodeBin:src_elem/GstDecodeBin:decodebin0/GstQTDemux:qtdemux0:
streaming stopped, reason error (-5)
Quitting
App run failed

mchi · March 20, 2021, 3:30pm

Hi @traional ,
Are there any change to Yolov3_tiny sample?
I believe the release Yolov3_tiny doesn’t have this issue.

traional · April 6, 2021, 3:27pm

There was a mismatch between the Deepstream and Jetpack versions. Thanks for the reply.