queueInputBatch(): cudaMemcpyAsync for output buffers failed (cudaErrorLaunchFailure)

HI mchi,

  1. can you try this on powerful platform, e.g. Xavier or dGPU+x86 system?

It will take some time, i’ll try and post the results.

  1. can set all three model to bacth==1 and check if this issue is still reproducibled

Batch size set to 1 for all the models and i receive the same error.

  1. run cuda-mmecheck to check the CUDA memory access

output:

========= CUDA-MEMCHECK   

Using winsys: x11 
========= Internal Memcheck Error: Initialization failed
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1 (cuDevicePrimaryCtxRetain + 0x154) [0x1fda6c]
=========     Host Frame:/usr/lib/aarch64-linux-gnu/gstreamer-1.0/deepstream/libnvdsgst_multistream.so [0x42724]
=========
Creating LL OSD context new
0:00:03.549280537 32736   0x5584b09490 INFO                 nvinfer gstnvinfer.cpp:519:gst_nvinfer_logger:<secondary_gie_1> NvDsInferContext[UID 3]:initialize(): Trying to create engine from model files
0:00:42.518084575 32736   0x5584b09490 INFO                 nvinfer gstnvinfer.cpp:519:gst_nvinfer_logger:<secondary_gie_1> NvDsInferContext[UID 3]:generateTRTModel(): Storing the serialized cuda engine to file at /opt/nvidia/deepstream/deepstream-4.0/sources/apps/sample_apps/anpr-test1/char_rec_model.uff_b1_fp32.engine
0:00:43.514035446 32736   0x5584b09490 INFO                 nvinfer gstnvinfer.cpp:519:gst_nvinfer_logger:<secondary_gie_0> NvDsInferContext[UID 2]:initialize(): Trying to create engine from model files
Loading pre-trained weights...
Loading complete!
Total Number of weights read : 291836
Output blob names :
yolo_13
Total number of layers: 31
Total number of layers on DLA: 0
Building the TensorRT Engine...
Building complete!
0:00:55.120270441 32736   0x5584b09490 INFO                 nvinfer gstnvinfer.cpp:519:gst_nvinfer_logger:<secondary_gie_0> NvDsInferContext[UID 2]:generateTRTModel(): Storing the serialized cuda engine to file at /opt/nvidia/deepstream/deepstream-4.0/sources/apps/sample_apps/deepstream-app/model_b1_fp32.engine
Deserialize yoloLayerV3 plugin: yolo_13
gstnvtracker: Loading low-level lib at /opt/nvidia/deepstream/deepstream-4.0/lib/libnvds_mot_klt.so
gstnvtracker: Optional NvMOT_RemoveStreams not implemented
gstnvtracker: Batch processing is OFF
Deserialize yoloLayerV3 plugin: yolo_15
Deserialize yoloLayerV3 plugin: yolo_22

Runtime commands:
	h: Print this help
	q: Quit

	p: Pause
	r: Resume

NOTE: To expand a source in the 2D tiled display and view object details, left-click on the source.
      To go back to the tiled display, right-click anywhere on the window.


**PERF: FPS 0 (Avg)	
**PERF: 0.00 (0.00)	
** INFO: <bus_callback:189>: Pipeline ready

Opening in BLOCKING MODE 
NvMMLiteOpen : Block : BlockType = 261 
NVMEDIA: Reading vendor.tegra.display-size : status: 6 
NvMMLiteBlockCreate : Block : BlockType = 261 
** INFO: <bus_callback:175>: Pipeline running

Creating LL OSD context new
0:00:57.739293883 32736   0x558470ac00 ERROR                nvinfer gstnvinfer.cpp:511:gst_nvinfer_logger:<primary_gie_classifier> NvDsInferContext[UID 1]:queueInputBatch(): cudaMemcpyAsync for output buffers failed (cudaErrorLaunchFailure)
0:00:57.739381687 32736   0x558470ac00 WARN                 nvinfer gstnvinfer.cpp:1098:gst_nvinfer_input_queue_loop:<primary_gie_classifier> error: Failed to queue input batch for inferencing
0:00:57.739578739 32736   0x558470ac00 ERROR                nvinfer gstnvinfer.cpp:511:gst_nvinfer_logger:<primary_gie_classifier> NvDsInferContext[UID 1]:queueInputBatch(): Failed to make stream wait on event(cudaErrorLaunchFailure)
0:00:57.740077886 32736   0x558470ac00 WARN                 nvinfer gstnvinfer.cpp:1098:gst_nvinfer_input_queue_loop:<primary_gie_classifier> error: Failed to queue input batch for inferencing
0:00:57.740187342 32736   0x558470ac00 ERROR                nvinfer gstnvinfer.cpp:511:gst_nvinfer_logger:<primary_gie_classifier> NvDsInferContext[UID 1]:queueInputBatch(): Failed to make stream wait on event(cudaErrorLaunchFailure)
0:00:57.740240668 32736   0x558470ac00 WARN                 nvinfer gstnvinfer.cpp:1098:gst_nvinfer_input_queue_loop:<primary_gie_classifier> error: Failed to queue input batch for inferencing
ERROR from primary_gie_classifier: Failed to queue input batch for inferencing
Debug info: /dvs/git/dirty/git-master_linux/deepstream/sdk/src/gst-plugins/gst-nvinfer/gstnvinfer.cpp(1098): gst_nvinfer_input_queue_loop (): /GstPipeline:pipeline/GstBin:primary_gie_bin/GstNvInfer:primary_gie_classifier
ERROR from primary_gie_classifier: Failed to queue input batch for inferencing
Debug info: /dvs/git/dirty/git-master_linux/deepstream/sdk/src/gst-plugins/gst-nvinfer/gstnvinfer.cpp(1098): gst_nvinfer_input_queue_loop (): /GstPipeline:pipeline/GstBin:primary_gie_bin/GstNvInfer:primary_gie_classifier
ERROR from primary_gie_classifier: Failed to queue input batch for inferencing
Debug info: /dvs/git/dirty/git-master_linux/deepstream/sdk/src/gst-plugins/gst-nvinfer/gstnvinfer.cpp(1098): gst_nvinfer_input_queue_loop (): /GstPipeline:pipeline/GstBin:primary_gie_bin/GstNvInfer:primary_gie_classifier
KLT Tracker Init
========= Error: process didn't terminate successfully
=========        The application may have hit an error when dereferencing Unified Memory from the host. Please rerun the application under cuda-gdb or Nsight Eclipse Edition to catch host side errors.
========= No CUDA-MEMCHECK results found

back-to-back-detectors.gz (217.0 KB)

Could you take a try this sample?

cd /opt/nvidia/deepstream/deepstream-5.0/sources/apps/sample_apps tar xpf back-to-back-detectors.tgz
cd back-to-back-detectors ./prebuild.sh
make ./back-to-back-detectors file:///opt/nvidia/deepstream/deepstream-5.0/samples/streams/sample_720p.h264

if you run on DeepStream 4,0, make below change to the Makefile.

NVDS_VERSION:=5.0

to

NVDS_VERSION:=4.0

And, in your configure file, please try changing

network-mode=0

to

network-mode=2

to use FP16 instead of FP32

Hi mchi,

I was trying trying to post but was unable because it said can’t post in read mode, however i tried the sample

Error:

0:00:19.772169864 11797   0x5585c5f050 WARN                 nvinfer gstnvinfer.cpp:1830:gst_nvinfer_output_loop:<char-recognition> error: Internal data stream error.
0:00:19.772249140 11797   0x5585c5f050 WARN                 nvinfer gstnvinfer.cpp:1830:gst_nvinfer_output_loop:<char-recognition> error: streaming stopped, reason error (-5)
ERROR from element char-recognition: Internal data stream error.
Error details: /dvs/git/dirty/git-master_linux/deepstream/sdk/src/gst-plugins/gst-nvinfer/gstnvinfer.cpp(1830): gst_nvinfer_output_loop (): /GstPipeline:anprtest1-pipeline/GstNvInfer:char-recognition:
streaming stopped, reason error (-5)
Returned, stopping playback
Deleting pipeline

what sample? how did you try? could you share more details?

The back-to-back-detectors sample that you provided, i tried it on jetson nano. Set the batch size to 1.