Why do SimpleIPC example use sharedMemoryCreate before cudaMalloc?

Recently I need to use CUDA IPC to transfer image data between ROS2 nodes, and I try to grasp CUDA IPC by reading simpleIPC example. But some code confuse me, why need sharedMemoryCreate before cudaMalloc? Doesn’t shm_open allocate memory in host memory?

The CUDA IPC mechanism allows for sharing of a device memory allocation from one process to another. The steps needed are approximately as follows:

  1. Process A allocates device memory.
  2. Process A gets a CUDA IPC handle for the allocation from step 1
  3. Process A creates a host IPC instance, so that the handle from step 2 can be communicated to process B
  4. Process A puts the handle into the host IPC mechanism
  5. Process B picks up the handle from the host IPC mechanism
  6. Process B uses the handle to “request access” to the underlying allocation

So the host IPC mechanism is needed for the communication of the device/CUDA IPC handle, from process A to process B.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.