Using CUDA Inter Process Communication Between Multiple Applications

Hello, this might not be the best place to ask this: I’ve been tasked to investigate using CUDA to accomplish multiple tasks across multiple applications, and yes these applications will all run inside an NGC CUDA container. In my travels I have stumbled across CUDA IPC and CUDA streams, but these seem to be used only among children and parent processes. This all makes total sense to me, and what I have read about CUDA. However, I was wondering if it was possible break up stuff up like writing data to the card, running multiple different kernels on said data, and reading results from different applications?

This ask is more of an artifact of the development process of my team. My thought is it would be best to follow the examples of CUDA streams and IPC that I have found, and ask that developers write a single application consisting of multiple kernels, and the logic to send data to the card running asynchronously with a barrier to make sure all the kernels are finished processing the data before more is moved to the card. As far as reading results out to other applications that would be a task for some messaging framework. Thanks in advance for helping a CUDA new guy.