Sharing cuda memory between containers x86 and jetson

Hi,

In my company we deploy our software in a very decoupled manner based on several docker containers communicating. A part of the workflow involves video decoding, video resizing and inference with a triton server.
Looks like a perfect use case for using deepstream but we actually need a service to manage frames from the decoding to the inference. So far we simply use shared memory which of course creates a big overhead because of all the copies done.

We need something that would answer these demands:

  • should be gpu shared memory
  • one writer, several readers
  • scalable for video streams (up to 60 fps)
  • should be able to be read by different containers
  • should work on gpu servers and jetsons
  • bonus, deepstream could actually directly write the frames into it

What would be the best approach ?
So far the information I have gathered are:

  • nvIPC is not available on jetson
  • nvsci could be a solution ?
  • using cuMemExportToShareableHandle and cuMemImportFromShareableHandle ?

But honestly the lack of examples of usage of the two last propositions are making them really hard to grasp.

Thanks in advance!

Some of this is jetson-specific, of course. There are a lot of knowledgeable people on the Jetson forums. You may get better help there. An example for cuMemExportToShareableHandle is available in the cuda samples. I believe support for this on Jetson requires Jetpack 5.0 or newer.

1 Like