I have created the following image processing application with Jetson AGX Xavier.
A single image processing thread contains the following (1) to (5).
(1) Average of all pixels
(2) 3x3 median filter
(3) Difference processing
(4) Binarization
(5) Image Copying
First, we implemented it using only ARM and confirmed that it runs on four threads simultaneously.
Next, I CUDA-ized the simple part.
CUDA-ized of (5) Image Copying.
This is a simple process of copying the image data to another memory.
It was easy to CUDA-ized and it worked.
However, Segmentation Fault occurs about once in ten times.
Checking the CORE file, the problem occurs at the timing of calling the CUDA kernel.
What are the possible causes of this problem?
In order to run in multi-threaded mode, we pay attention to the following points.
* Use CUDA streams.
* CUDA streams and unified memory are provided for each thread.
* Use “–default-stream per-thread” as an option of nvcc.