cudaStreamAddCallback not working with MPS

I am using MPS with GTX 1070. The below code doesn’t work when MPS is enabled (get error “operation not supported” when trying to add callback to stream) but it works when MPS is not enabled.

for (int i = 0; i < count; i++) {
        ret = gpuErrCheck(cudaStreamCreateWithFlags(&streams[i], cudaStreamDefault));
        if (ret < 0)
            goto err;
        
        ret = gpuErrCheck(cudaStreamAddCallback(streams[i], stream_callback,
                    (void *)(uintptr_t)i, 0));
        if (ret < 0)
            goto err;
    }

Any ideas why “cudaStreamAddCallback()” doesn’t work with MPS?

To enable MPS I am using following script:

#!/bin/bash
# the following must be performed with root privilege
export CUDA_VISIBLE_DEVICES="0"
nvidia-smi -i 0 -c EXCLUSIVE_PROCESS
nvidia-cuda-mps-control -d

This is expected behavior pre-Volta and noted in the MPS doc:

https://docs.nvidia.com/deploy/pdf/CUDA_Multi_Process_Service_Overview.pdf

“2.3.2. Application Considerations

Stream callbacks are not supported on pre-Volta MPS clients. Calling any stream
callback APIs will return an error.”

Thanks.

Q1) Just to make sure it is clear, cudaStreamSynchronize() should still work then with MPS, right?

Q2) So if I need a callback, I can get equivalent functionality using pthreads and cudaStreamSynchronize()? I am wondering what is the underlying limitation that cuda library doesn’t provide stream callbacks in case of MPS?

Q1 yes

Q2 I haven’t through through it carefully, and I’m not sure what you have in mind exactly, but the ideas I have revolve around issuing an event into a stream where you want the callback to occur, then spinning up another CPU thread that does some form of busy-waiting on cudaEventQuery. Again, I have not thought through it in any detail.