What is good practice with videoSource amd videoOutput? eg 'unspecified launch failure (error 719)'

Hi again,
I am having no end of trouble trying to tame videoSource and videoOutput with innumerable …

[cuda]      unspecified launch failure (error 719) (hex 0x2CF)
[cuda]      /home/jc/SW_Components2021/jetson-inference/utils/codec/gstEncoder.cpp:575

The errors start at different places each run so I cant find a pattern.
If I keep it simple it doesn’t complain but what I am after isn’t that complicated.
I assume I am stretching what these libraries can do but I have no rules of thumb to go by.

In one thread running about 20 times/sec, is uses one videoSource ‘csi://0’ and 2 videoOutput
One is a mp4 stream image for image from the source; this ‘swaps’ output files every minute or so with outVid.Close() then outVid = jetson.utils.videoOutput with a different file name.
The second is a periodic snapshot jpg.
The videoSource puts each frame into a cuda buffer in global space for other things to read

Meanwhile there is a second thread, running about 10 times / sec, that catches the image from global space
then does a segnet analysis on it and creates a third videoOutput of the overlay after the analysis.
It also annotates the image via jetson.utils.cudaFont OverlayText
This also swaps files every minute or so.

Once the program gets upset it will cascade errors everywhere. If often requires a reboot to get any sense back.
And every now and again it works properly!

So is it reasonable to use 3 videoOutputs ?
So is it reasonable to put them in different threads ?
Why does videoSource have to capture one frame before any attempt is made to declare or use a videoOutput ?

I hope you can help.
thanks
JC

DEBUG:(general_timer) ** Starting general_timer()
DEBUG:(get_camera_frame) ** Starting get_camera_frame()
DEBUG:(show_image_in_window) ** Starting show_image_in_window()
DEBUG:(analyse_images) ** Starting 
DEBUG:(main) after launching all threads
DEBUG:(main) waiting for thread start

segNet -- loading segmentation network model from:
       -- prototxt:   (null)
       -- model:      /home/jc/jcCode/trained_models/jc_floor_alone_640x480/fcn_resnet18.onnx
       -- labels:     /home/jc/jcCode/trained_models/jc_floor_alone_640x480/classes.txt
       -- colors:     /home/jc/jcCode/trained_models/jc_floor_alone_640x480/colors.txt
       -- input_blob  'input_0'
       -- output_blob 'output_0'
       -- batch_size  1

[TRT]    TensorRT version 8.0.1
[TRT]    Could not register plugin creator -  ::FlattenConcat_TRT version 1
[TRT]    requested fasted precision for device GPU without providing valid calibrator, disabling INT8
[TRT]    [MemUsageChange] Init CUDA: CPU +198, GPU -1, now: CPU 234, GPU 3847 (MiB)
[TRT]    loading network plan from engine cache... /home/jc/jcCode/trained_models/jc_floor_alone_640x480/fcn_resnet18.onnx.1.1.8001.GPU.FP16.engine
[TRT]    device GPU, loaded /home/jc/jcCode/trained_models/jc_floor_alone_640x480/fcn_resnet18.onnx
[TRT]    [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 291, GPU 3855 (MiB)
[TRT]    Loaded engine size: 57 MB
[TRT]    [MemUsageSnapshot] deserializeCudaEngine begin: CPU 291 MiB, GPU 3855 MiB
[TRT]    [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +158, GPU -94, now: CPU 449, GPU 3757 (MiB)
[TRT]    [MemUsageChange] Init cuDNN: CPU +241, GPU -64, now: CPU 690, GPU 3693 (MiB)
[TRT]    [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 690, GPU 3694 (MiB)
[TRT]    [MemUsageSnapshot] deserializeCudaEngine end: CPU 690 MiB, GPU 3694 MiB
[TRT]    [MemUsageSnapshot] ExecutionContext creation begin: CPU 690 MiB, GPU 3685 MiB
[TRT]    [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU -3, now: CPU 690, GPU 3682 (MiB)
[TRT]    [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 690, GPU 3682 (MiB)
[TRT]    [MemUsageSnapshot] ExecutionContext creation end: CPU 690 MiB, GPU 3663 MiB
[TRT]    
[TRT]    CUDA engine context initialized on device GPU:
[TRT]       -- layers       25
[TRT]       -- maxBatchSize 1
[TRT]       -- deviceMemory 14745600
[TRT]       -- bindings     2
[TRT]       binding 0
                -- index   0
                -- name    'input_0'
                -- type    FP32
                -- in/out  INPUT
                -- # dims  4
                -- dim #0  1
                -- dim #1  3
                -- dim #2  480
                -- dim #3  640
[TRT]       binding 1
                -- index   1
                -- name    'output_0'
                -- type    FP32
                -- in/out  OUTPUT
                -- # dims  4
                -- dim #0  1
                -- dim #1  7
                -- dim #2  15
                -- dim #3  20
[TRT]    
[TRT]    
[TRT]    device GPU, /home/jc/jcCode/trained_models/jc_floor_alone_640x480/fcn_resnet18.onnx initialized.
[gstreamer] opening gstCamera for streaming, transitioning pipeline to GST_STATE_PLAYING
GST_ARGUS: Creating output stream
CONSUMER: Waiting until producer is connected...
GST_ARGUS: Available Sensor modes :
GST_ARGUS: 3264 x 2464 FR = 21.000000 fps Duration = 47619048 ; Analog Gain range min 1.000000, max 10.625000; Exposure Range min 13000, max 683709000;

GST_ARGUS: 3264 x 1848 FR = 28.000001 fps Duration = 35714284 ; Analog Gain range min 1.000000, max 10.625000; Exposure Range min 13000, max 683709000;

GST_ARGUS: 1920 x 1080 FR = 29.999999 fps Duration = 33333334 ; Analog Gain range min 1.000000, max 10.625000; Exposure Range min 13000, max 683709000;

GST_ARGUS: 1640 x 1232 FR = 29.999999 fps Duration = 33333334 ; Analog Gain range min 1.000000, max 10.625000; Exposure Range min 13000, max 683709000;

GST_ARGUS: 1280 x 720 FR = 59.999999 fps Duration = 16666667 ; Analog Gain range min 1.000000, max 10.625000; Exposure Range min 13000, max 683709000;

GST_ARGUS: 1280 x 720 FR = 120.000005 fps Duration = 8333333 ; Analog Gain range min 1.000000, max 10.625000; Exposure Range min 13000, max 683709000;

GST_ARGUS: Running with following settings:
   Camera index = 0 
   Camera mode  = 5 
   Output Stream W = 1280 H = 720 
   seconds to Run    = 0 
   Frame Rate = 120.000005 
GST_ARGUS: Setup Complete, Starting captures for 0 seconds
GST_ARGUS: Starting repeat capture requests.
CONSUMER: Producer has connected; continuing.
DEBUG:(main) waiting for thread start
DEBUG:(analyse_images) Segnet model loaded and buffers allocated
DEBUG:(get_camera_frame) camera is live
DEBUG:(analyse_images) Network Num classes = 7
DEBUG:(analyse_images) Network grid width = 20 height = 15
DEBUG:(show_image_in_window)  after opening outvid = display
DEBUG:(main) waiting for thread start
DEBUG:(get_camera_frame) video file_name=/media/jc/ROBO-USB/Robo_Video_trace/robo_wander_base_20220727-130907_0000.mp4
DEBUG:(analyse_images) video file_name=/media/jc/ROBO-USB/Robo_Video_trace/robo_wander_dir_20220727-130908_0000.mp4
[cuda]      unspecified launch failure (error 719) (hex 0x2CF)
[cuda]      /home/jc/SW_Components2021/jetson-inference/utils/codec/gstEncoder.cpp:575
[cuda]      unspecified launch failure (error 719) (hex 0x2CF)
[cuda]      /home/jc/SW_Components2021/jetson-inference/utils/python/bindings/PyCUDA.cpp:1102
Framerate set to : 30 at NvxVideoEncoderSetParameterNvMMLiteOpen : Block : BlockType = 4 
===== NVMEDIA: NVENC =====
DEBUG:(main) waiting for thread start
DEBUG:(analyse_images)  after opening outvid = vector mp4
[cuda]      unspecified launch failure (error 719) (hex 0x2CF)
[cuda]      /home/jc/SW_Components2021/jetson-inference/utils/python/bindings/PyCUDA.cpp:1102
NvMMLiteBlockCreate : Block : BlockType = 4 
[cuda]      unspecified launch failure (error 719) (hex 0x2CF)
[cuda]      /home/jc/SW_Components2021/jetson-inference/utils/codec/gstBufferManager.cpp:319
[gstreamer] gstDecoder -- failed to retrieve next image buffer
H264: Profile = 66, Level = 40 
NVMEDIA_ENC: bBlitMode is set to TRUE 
Exception in thread Thread-6:
Traceback (most recent call last):
  File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/home/jc/jcCode/car_controller_P6_0v083/car_controller_P6_images.py", line 162, in get_camera_frame
    image = camera.Capture()
Exception: jetson.utils -- videoSource failed to capture image

CONSUMER: Done Success
GST_ARGUS: Cleaning up
GST_ARGUS: Done Success
[cuda]      unspecified launch failure (error 719) (hex 0x2CF)
[cuda]      /home/jc/SW_Components2021/jetson-inference/build/aarch64/include/jetson-utils/RingBuffer.inl:119
[cuda]      unspecified launch failure (error 719) (hex 0x2CF)
[cuda]      /home/jc/SW_Components2021/jetson-inference/build/aarch64/include/jetson-utils/RingBuffer.inl:119
[cuda]      unspecified launch failure (error 719) (hex 0x2CF)
[cuda]      /home/jc/SW_Components2021/jetson-inference/build/aarch64/include/jetson-utils/RingBuffer.inl:119
[cuda]      unspecified launch failure (error 719) (hex 0x2CF)
[cuda]      /home/jc/SW_Components2021/jetson-inference/build/aarch64/include/jetson-utils/RingBuffer.inl:119
[cuda]      unspecified launch failure (error 719) (hex 0x2CF)
[cuda]      /home/jc/SW_Components2021/jetson-inference/build/aarch64/include/jetson-utils/RingBuffer.inl:119
[cuda]      unspecified launch failure (error 719) (hex 0x2CF)
[cuda]      /home/jc/SW_Components2021/jetson-inference/build/aarch64/include/jetson-utils/RingBuffer.inl:119
[cuda]      unspecified launch failure (error 719) (hex 0x2CF)
[cuda]      /home/jc/SW_Components2021/jetson-inference/build/aarch64/include/jetson-utils/RingBuffer.inl:119
[cuda]      unspecified launch failure (error 719) (hex 0x2CF)
[cuda]      /home/jc/SW_Components2021/jetson-inference/build/aarch64/include/jetson-utils/RingBuffer.inl:119
[cuda]      unspecified launch failure (error 719) (hex 0x2CF)
[cuda]      /home/jc/SW_Components2021/jetson-inference/utils/cuda/cudaFont.cu:98
[cuda]      unspecified launch failure (error 719) (hex 0x2CF)
[cuda]      /home/jc/SW_Components2021/jetson-inference/utils/cuda/cudaFont.cu:106
[cuda]      unspecified launch failure (error 719) (hex 0x2CF)
[cuda]      /home/jc/SW_Components2021/jetson-inference/utils/cuda/cudaFont.cu:114
[cuda]      unspecified launch failure (error 719) (hex 0x2CF)
[cuda]      /home/jc/SW_Components2021/jetson-inference/utils/python/bindings/PyCUDA.cpp:69
[cuda]      unspecified launch failure (error 719) (hex 0x2CF)
[cuda]      /home/jc/SW_Components2021/jetson-inference/build/aarch64/include/jetson-utils/RingBuffer.inl:119
[cuda]      unspecified launch failure (error 719) (hex 0x2CF)
[cuda]      /home/jc/SW_Components2021/jetson-inference/build/aarch64/include/jetson-utils/RingBuffer.inl:119
Exception in thread Thread-8:
Traceback (most recent call last):
  File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/home/jc/jcCode/car_controller_P6_0v083/car_controller_P6_images.py", line 507, in analyse_images
    jetson.utils.cudaMemcpy(TempCudaImageToUse, rwsg.G_CudaImageToUse)
TypeError: jetson.utils -- cudaMemcpy() failed to copy memory

[cuda]      unspecified launch failure (error 719) (hex 0x2CF)
[cuda]      /home/jc/SW_Components2021/jetson-inference/utils/cuda/cudaFont.cu:98
[cuda]      unspecified launch failure (error 719) (hex 0x2CF)
[cuda]      /home/jc/SW_Components2021/jetson-inference/utils/cuda/cudaFont.cu:106
[cuda]      unspecified launch failure (error 719) (hex 0x2CF)
[cuda]      /home/jc/SW_Components2021/jetson-inference/utils/cuda/cudaFont.cu:114
[cuda]      unspecified launch failure (error 719) (hex 0x2CF)
[cuda]      /home/jc/SW_Components2021/jetson-inference/utils/python/bindings/PyCUDA.cpp:69
[TRT]    [defaultAllocator.cpp::free::85] Error Code 1: Cuda Runtime (unspecified launch failure)
[cuda]      unspecified launch failure (error 719) (hex 0x2CF)
[cuda]      /home/jc/SW_Components2021/jetson-inference/utils/python/bindings/PyCUDA.cpp:69
[cuda]      unspecified launch failure (error 719) (hex 0x2CF)
[cuda]      /home/jc/SW_Components2021/jetson-inference/utils/python/bindings/PyCUDA.cpp:69
[cuda]      unspecified launch failure (error 719) (hex 0x2CF)
[cuda]      /home/jc/SW_Components2021/jetson-inference/utils/python/bindings/PyCUDA.cpp:69
[cuda]      unspecified launch failure (error 719) (hex 0x2CF)
[cuda]      /home/jc/SW_Components2021/jetson-inference/utils/python/bindings/PyCUDA.cpp:69
DEBUG:(main) all threads report 'running'
Exception in thread Thread-7:
Traceback (most recent call last):
  File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/home/jc/jcCode/car_controller_P6_0v083/car_controller_P6_images.py", line 305, in show_image_in_window
    jetson.utils.cudaMemcpy(TempCudaImageToUse, rwsg.G_CudaImageToUse)      # 0v81
TypeError: jetson.utils -- cudaMemcpy() failed to copy memory

[cuda]      unspecified launch failure (error 719) (hex 0x2CF)
[cuda]      /home/jc/SW_Components2021/jetson-inference/utils/cuda/cudaFont.cu:98
[cuda]      unspecified launch failure (error 719) (hex 0x2CF)
[cuda]      /home/jc/SW_Components2021/jetson-inference/utils/cuda/cudaFont.cu:106
[cuda]      unspecified launch failure (error 719) (hex 0x2CF)
[cuda]      /home/jc/SW_Components2021/jetson-inference/utils/cuda/cudaFont.cu:114
[cuda]      unspecified launch failure (error 719) (hex 0x2CF)
[cuda]      /home/jc/SW_Components2021/jetson-inference/utils/python/bindings/PyCUDA.cpp:69
[cuda]      unspecified launch failure (error 719) (hex 0x2CF)
[cuda]      /home/jc/SW_Components2021/jetson-inference/utils/display/glTexture.cpp:246
DEBUG:(messaging.initialisation) after Mqtt client loop start
INFO:(main) Starting Main Loop at time=13:09:30
DEBUG:Connected With Result Code 0
DEBUG:(do_subscribe) after client.subscribe
DEBUG:(general_timer) rwsg.G_event_10units triggered at time=13:09:40
INFO:(things_to_do_every_10sec) Main Loop count=4579 Image count=1 (0fps) analysed_image_count=1 image_in_window_count=3 at increment time=13:09:40 thats 468 loops/sec
INFO:(main) Main Loop count=5000 Image count=1 (0fps) analysed_image_count=1 image_in_window_count=3 at increment time=13:09:41 thats 467 loops/sec
DEBUG:(general_timer) rwsg.G_event_10units triggered at time=13:09:50
INFO:(things_to_do_every_10sec) Main Loop count=9337 Image count=1 (0fps) analysed_image_count=1 image_in_window_count=3 at increment time=13:09:50 thats 469 loops/sec
DEBUG: regular status check.   G_autowander_mode =False
INFO:(main) Main Loop count=10000 Image count=1 (0fps) analysed_image_count=1 image_in_window_count=3 at increment time=13:09:51 thats 469 loops/sec
DEBUG:(general_timer) rwsg.G_event_10units triggered at time=13:10:00
INFO:(things_to_do_every_10sec) Main Loop count=14117 Image count=1 (0fps) analysed_image_count=1 image_in_window_count=3 at increment time=13:10:00 thats 470 loops/sec
INFO:(main) Main Loop count=15000 Image count=1 (0fps) analysed_image_count=1 image_in_window_count=3 at increment time=13:10:02 thats 471 loops/sec

Hi @jc5p, it looks like these are Python threads - are they running in different processes? CUDA memory isn’t shared across processes.

videoSource and videoOutput are already threaded inside of their C++ implementations. You can call videoSource.Capture(timeout=0) to poll and return immediately if a frame isn’t ready.

Thanks @dusty_nv, yes they are python threads and no they are not in different processes. I am aware that CUDA memory isn’t shared across processes.
The threads are there for my program needs - dealing with these images and videos is only a fraction of the overall functionality of the program.

Any thoughts on what might cause the apparently random explosion of
[cuda] unspecified launch failure (error 719) (hex 0x2CF)
error messages ?
JC

Not 100% sure for your case, but I think that this error code means reaching out of resources.
You may build /usr/local/cuda/samples/1_Utilities/deviceQuery and run it for getting details about CUDA cores and threads/blocks.
Also note that debug builds may involve extra registers.

That error can also happen when there are memory faults inside a kernel, and in this case it strikes me that could occur if you are running multiple threads that are operating on the same image buffer simultaneously without synchronization. I also don’t really have support for multiple CUDA streams implemented across the entirety jetson.inference/jetson.utils as typically the use-cases for this library aren’t heavily multithreaded.

What I would do is try temporarily disabling some of the threads to deduce what is triggering the errors, or make a copy or copies of the image buffer (using jetson.utils.cudaMemcpy) before you distribute it to your threads.

Thanks @Honey_patouceul, I tried that utility as you suggested but I cant see anything obvious - I am not sure it was even displaying dynamic data.`

@dusty_nv, for a minute I thought you had cracked the problem! the hint was without synchronization
I reviewed where I had placed this an tinkered a bit.
Then it worked — but only for 267 images and next run 569 images :(
slightly different trigger point but then the same cascading errors.

Can you just confirm please that the synchronise is correctly placed. It is after the only command that modifies the shared CudaImage
and its in a Threading.Lock block

       with rwsg.G_global_lock:
            rwsg.G_image_available = False  
            rwsg.G_imageCounter += 1
            jetson.utils.cudaMemcpy(rwsg.G_CudaImageToUse, image)
            jetson.utils.cudaDeviceSynchronize()   
            rwsg.G_image_available = True

Thanks
JC

latest error position…

NVMEDIA_ENC: bBlitMode is set to TRUE 
[gstreamer] gstBufferManager -- failed to retrieve timestamp buffer (default to 0)
INFO:(main) Main Loop count=10000 Image count=1311 (22fps) analysed_image_count=267 image_in_window_count=104 at increment time=13:56:16 thats 173 loops/sec
[gstreamer] gstDecoder -- failed to retrieve next image buffer
Exception in thread Thread-6:
Traceback (most recent call last):
  File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/home/jc/jcCode/car_controller_P6_0v083/car_controller_P6_images.py", line 164, in get_camera_frame
    image = camera.Capture()
Exception: jetson.utils -- videoSource failed to capture image

GST_ARGUS: Cleaning up
CONSUMER: Done Success
GST_ARGUS: Done Success
[cuda]      unspecified launch failure (error 719) (hex 0x2CF)

Are you sure this is not coming from the video stream timing out?

No I am not sure - how could I tell? I have tried with the videoSource.Capture(timeout=0) that you suggested but it didn’t make a difference.
I cant see that Capture() returns anything other than an image.
I also tried starting the threads one by one incase they were getting tangled ‘opening’ things. Same result.

OK gotcha - the Python version of videoSource.Capture() only returns an image, or an exception if a timeout occurred.

Unfortunately I’m unable to debug what is the exact issue here (and I’m by no means an expert on Python multithreading). Does it change the behavior if you increase the number of buffers in the videoSource/videoOutput with the --num-buffers argument (the default is 4)? Or you can change it here and recompile/reinstall - this would be useful to know if it’s related to the problem.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.