OpenACC based cuda application for GPU utilization

I am working on an application for the Jetson AGX Xavier, where I aim to utilize the GPU for continuous data processing using OpenACC. I have two functions that I plan to offload to the GPU using OpenACC. These functions will continuously wait for events to process data. Additionally, I have a GPU thread dedicated to managing buffer sharing between the CPU and GPU.

  • GPU Thread: Continuously waits for an event from the CPU. When the CPU fills the buffer and sends an event, the GPU Thread copies the data from the CPU buffer to another buffer on the GPU and triggers an event for Function 1.
  • Function 1: Waits for a data buffer from the GPU Thread via an event. Once triggered, it copies the data from the buffer, processes it, and sends an event to Function 2 on the GPU.
  • Function 2: Waits for the event from Function 1. Upon receiving the event, it processes the data and, after completing the processing, sends the buffer back to the CPU using an event.

My questions are:

  1. Is it possible to handle events using the CUDA events API between the host (CPU) and the GPU Thread, as well as between Function 2 and the CPU?
  2. Can event handling between the GPU Thread and Function 1, as well as between Function 1 and Function 2 (both offloaded using OpenACC kernels pragmas), be implemented effectively?
  3. Furthermore, in my application I want for all processing, including event handling between the GPU Thread, Function 1, and Function 2, to remain entirely on the GPU without involving the CPU once the data has been transferred. Is it ensured in this approach?

Does this approach seem correct?