Process freezes at cudaEventSynchronize when CUDA IPC and MPS service are both used

I am using CUDA IPC in an MPI code to transfer some data between GPUs that are peer accessible. It works fine when MPS service is not active. However, once I use this service the process that is waiting on the shared event freezes.

I am using CUDA 7.5, OpenMPI 1.10.4 and one K80 GPU. So I am performing the IPC copy between the two GPUs in the K80 board.

For passing the memory and event handles, I followed the instruction that is provided in the simpleIPC.cu from the NVIDIA SDK.

Here is a part of the code:

if (rank != 0)
{
cudaError(cudaMemcpy((void *) my_tmp_buf_r + rank, (void *) my_sendbuf, 1, cudaMemcpyDeviceToDevice));

cudaError(cudaEventRecord((cudaEvent_t) event, 0));

MPI_Barrier(MPI_COMM_WORLD);

}

else if (IMI_rank == 0)
{
MPI_Barrier(MPI_COMM_WORLD);
int count;
for (count = 1; count < size; ++count)
{
cudaError(cudaEventSynchronize(my_event_r[count]));
}
Again, note that the code works properly without MPS service. With MPS it freezes and I have to kill the job.

Here is the MPS log after the job is killed:

2016-10-12 13:53:44.729 Other 28923] Start
[2016-10-12 13:53:47.091 Other 28923] New client 28922 connected
[2016-10-12 13:53:47.105 Other 28923] New client 28921 connected
[2016-10-12 13:53:55.500 Other 28923] Client 28921 disconnected
[2016-10-12 13:53:55.502 Other 28923] Client 28922 disconnected
[2016-10-12 13:53:55.502 Other 28923] Waiting for current clients to finish
[2016-10-12 13:53:55.502 Other 28923] Exit
Any idea on what is causing this?