Hi there,
I am trying to share some memory pool between processes, but I always get double free error when I try to import. I have not allocated any memory, let alone free it. So I am confused.
I have two programs. One of my programs calls the following function to create and export a pool, and then send the exported file descriptor to the other program.
void send_pool(int src_device, int dst_device, int sock) {
// I have 4 GPUs; enable P2P access between GPUs
cudaSetDevice(dst_device);
cudaDeviceEnablePeerAccess(src_device, 0);
cudaMemPool_t pool;
cudaMemPoolProps poolProps = {};
poolProps.allocType = cudaMemAllocationTypePinned;
poolProps.handleTypes = cudaMemHandleTypePosixFileDescriptor;
poolProps.location.type = cudaMemLocationTypeDevice;
poolProps.location.id = dst_device;
cudaMemPoolCreate(&pool, &poolProps);
// Export pool to file descriptor
int fd;
cudaMemAllocationHandleType handleType = cudaMemHandleTypePosixFileDescriptor;
cudaMemPoolExportToShareableHandle(&fd, pool, handleType, 0);
// send the file descriptor via socket
send_fd(sock, fd);
}
The other program executes the following function to wait until receiving the descriptor, and then it will import it as a pool.
void receive_pool(int dst_device, int sock) {
cudaSetDevice(dst_device);
int fd;
// receive the file descriptor via socket
receive_fd(sock, &fd);
// import the pool
cudaMemPool_t importPool;
cudaMemPoolImportFromShareableHandle(&importPool, &fd, cudaMemHandleTypePosixFileDescriptor, 0);
}
But when I run these two programs, I always encounter the following error in the receiver process.
free(): double free detected in tcache 2
I believe this error happens immediately after I invoke the cudaMemPoolImportFromShareableHandle
API. Am I using the APIs in a wrong way?
P.S. My environment has 4 Tesla V100 GPUs, and I am using CUDA 11.8 in Ubuntu 20.04.
Many thanks.