The MPS documentation has no stated support for pointer sharing, that I can see.
Architecturally, yes, every process is taking a separate chunk of the virtual address space of the GPU, for its own needs. This does not mean that each process has the same logical->virtual address mapping. The virtual space is unified/harmonized, but each process maintains it’s own logical->virtual mapping. This means a pointer in one process has no meaning when dereferenced in another process.
cuLaunchKernel will not return an error, as the launch process has no way of knowing the pointer is invalid. It will attempt to launch that kernel, which will begin executing until it dereferences that bogus pointer. At that point, bad things will happen. I would expect the failed launch to show up at the next synchronize point, but I’m just speculating, and working off your description.
The document’s statement about out-of-range reads is exactly that: a warning that there is no enforced interprocess memory security provided by the GPU/driver.
As I’ve already mentioned, CUDA IPC is provided to help you work around this.
As a simple test, I took the two-test-app sample code that I provided in the IPC thread that I linked, and put a printf statement in each app to print out the numerical value of the data variable (the pointer that was “shared” via the IPC mechanism). This is a 64-bit linux system and UVA is in effect. The numerical values of the pointers are not the same between the two processes. (You could try repeating this experiment if you like in your MPS setup, it should not be difficult.) Passing a numerical pointer value directly from one process to another is going to be problematic.
You may want to read sections 3.2.7 and 3.2.8 of the CUDA programming guide:
“Any device memory pointer or event handle created by a host thread can be directly referenced by any other thread within the same process. It is not valid outside this process however, and therefore cannot be directly referenced by threads belonging to a different process.”
I know of nothing in MPS that abrogates that.
If you think carefully about the implications of unified virtual addressing in a multi-process environment, I think it will become clear to you that the CUDA driver must maintain its own mappings of logical->virtual address, which mappings may vary from process to process.