Multithreading CUDA processing on Jetson AGX Xavier causes segmantation fault

I have created the following image processing application with Jetson AGX Xavier.

A single image processing thread contains the following (1) to (5).
(1) Average of all pixels
(2) 3x3 median filter
(3) Difference processing
(4) Binarization
(5) Image Copying
First, we implemented it using only ARM and confirmed that it runs on four threads simultaneously.

Next, I CUDA-ized the simple part.
CUDA-ized of (5) Image Copying.
This is a simple process of copying the image data to another memory.
It was easy to CUDA-ized and it worked.
However, Segmentation Fault occurs about once in ten times.
Checking the CORE file, the problem occurs at the timing of calling the CUDA kernel.
What are the possible causes of this problem?

In order to run in multi-threaded mode, we pay attention to the following points.
* Use CUDA streams.
* CUDA streams and unified memory are provided for each thread.
* Use “–default-stream per-thread” as an option of nvcc.


Please noted that Jetson doesn’t support concurrent access.
So it’s essential to make sure that two processors don’t access the memory buffer simultaneously.
(Two processors indicating CPU and GPU in your use case)

Have you added the synchronization call to ensure all the GPU tasks finished (memcpy) before accessing the buffer with A57?