GPU Inter-Process Communications(IPC) question

Robert_Crovella · December 11, 2014, 1:21am

The cuda IPC sample code demonstrates the use of mmap() to pass IPC handles between processes.

For amusement purposes, I tried implementing the fifo method (named pipe) that I referenced above, to demonstrate a different approach. The following is the code of the two independent applications:

app1.cu:

// app 1, part of a 2-part IPC example
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#define DSIZE 1

#define cudaCheckErrors(msg) \
    do { \
        cudaError_t __err = cudaGetLastError(); \
        if (__err != cudaSuccess) { \
            fprintf(stderr, "Fatal error: %s (%s at %s:%d)\n", \
                msg, cudaGetErrorString(__err), \
                __FILE__, __LINE__); \
            fprintf(stderr, "*** FAILED - ABORTING\n"); \
            exit(1); \
        } \
    } while (0)

int main(){
  system("rm -f testfifo");  // remove any debris
  int ret = mkfifo("testfifo", 0600); // create fifo
  if (ret != 0) {printf("mkfifo error: %d\n",ret); return 1;}
  int *data;
  cudaMalloc(&data, DSIZE*sizeof(int));
  cudaCheckErrors("malloc fail");
  cudaMemset(data, 0, DSIZE*sizeof(int));
  cudaCheckErrors("memset fail");
  cudaIpcMemHandle_t my_handle;
  cudaIpcGetMemHandle(&my_handle, data);
  unsigned char handle_buffer[sizeof(my_handle)+1];
  memset(handle_buffer, 0, sizeof(my_handle)+1);
  memcpy(handle_buffer, (unsigned char *)(&my_handle), sizeof(my_handle));
  cudaCheckErrors("get IPC handle fail");
  FILE *fp;
  printf("waiting for app2\n");
  fp = fopen("testfifo", "w");
  if (fp == NULL) {printf("fifo open fail \n"); return 1;}
  for (int i=0; i < sizeof(my_handle); i++){
    ret = fprintf(fp,"%c", handle_buffer[i]);
    if (ret != 1) printf("ret = %d\n", ret);}
  fclose(fp);
  sleep(2);  // wait for app 2 to modify data
  int *result = (int *)malloc(DSIZE*sizeof(int));
  cudaMemcpy(result, data, DSIZE*sizeof(int), cudaMemcpyDeviceToHost);
  if (!(*result)) printf("Fail!\n");
  else printf("Success!\n");
  system("rm testfifo");
  return 0;
}

app2.cu:

// app 2, part of a 2-part IPC example
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#define DSIZE 1

#define cudaCheckErrors(msg) \
    do { \
        cudaError_t __err = cudaGetLastError(); \
        if (__err != cudaSuccess) { \
            fprintf(stderr, "Fatal error: %s (%s at %s:%d)\n", \
                msg, cudaGetErrorString(__err), \
                __FILE__, __LINE__); \
            fprintf(stderr, "*** FAILED - ABORTING\n"); \
            exit(1); \
        } \
    } while (0)

__global__ void set_kernel(volatile int *d, int val){
  *d = val;
}

int main(){
  int *data;
  cudaIpcMemHandle_t my_handle;
  unsigned char handle_buffer[sizeof(my_handle)+1];
  memset(handle_buffer, 0, sizeof(my_handle)+1);
  FILE *fp;
  fp = fopen("testfifo", "r");
  if (fp == NULL) {printf("fifo open fail \n"); return 1;}
  int ret;
  for (int i = 0; i < sizeof(my_handle); i++){
    ret = fscanf(fp,"%c", handle_buffer+i);
    if (ret == EOF) printf("received EOF\n");
    else if (ret != 1) printf("fscanf returned %d\n", ret);}
  memcpy((unsigned char *)(&my_handle), handle_buffer, sizeof(my_handle));
  cudaIpcOpenMemHandle((void **)&data, my_handle, cudaIpcMemLazyEnablePeerAccess);
  cudaCheckErrors("IPC handle fail");
  set_kernel<<<1,1>>>(data, 1);
  cudaDeviceSynchronize();
  cudaCheckErrors("memset fail");
  return 0;
}

If you run app1, it will start up and then wait for app2 to start. Then, when you start app2, app1 will send the IPC handle to app2. app2 will use that handle to modify some memory allocated by app1, and then exit. app1 waits (sleeps) for a short period, then checks to see if the modification was made by app2.

Just a proof of concept.

Topic		Replies	Views
How to access gpu memory between processes CUDA Programming and Performance	10	2675	August 4, 2023
How to improve the performance of using CUDA IPC shared memory? CUDA Programming and Performance cuda	5	215	October 23, 2024
Problem with IPC CUDA Programming and Performance	10	3436	May 27, 2020
Real-time GPU processing Peer 2 peer data copy, Linux kernel memory, kernels in kernel, CUDA Programming and Performance	35	8105	June 30, 2010
'invalid device ordinal' (cudaErrorInvalidDevice) CUDA Programming and Performance	6	5621	August 25, 2015
Got out of memory from cudaMemcpy CUDA Programming and Performance	13	4016	January 28, 2022
CUDA shared memory CNN (convolutional neural network) CUDA Programming and Performance	7	2415	July 21, 2017
Wishlist Place your considered suggestions here CUDA Programming and Performance	201	204317	April 13, 2009
cudaMalloc performance issue after p2p access is enabled CUDA Programming and Performance	10	365	June 5, 2024
Multiple GPUs Devise a synchro mechanism for host threads CUDA Programming and Performance	7	4199	May 13, 2010

GPU Inter-Process Communications(IPC) question

Related topics