Can cudaIPC be only used between processes?

I followed the cuda examples to implement cudaIPC. Before I have two machines, each of which has 8 GPUs. Let’s say we have worker and server processes. In my case, worker has to send data to server using cudaIPC. But per project’s need, now we initiate the worker and server as two threads in one process on each machine. But if I still use the old logic, it reports that: Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading CUDA: invalid device ordinal . I want to know if cudaIPC can only be used between processes? For my case, how should I modified my implementation?