Multiple Independent Host Processes with One GPU Board

[b][font="'Lucida Console"]I have several independent Host processes that all have GPU kernels they want to run on their own data. I am trying to understand [/font]

[font="'Lucida Console"]how a CUDA GPU board gets shared among the Host processes. I am guessing that each Host process inaugurates its own context and thus each host process can use the entire global memory on the board.[/font]

[font="'Lucida Console"]I am trying to understand the blocking and sharing mechanisms. specifically if the GPU is busy executing a kernel for one host process and a second host process tries to access the gpu board then does the[/font]

[font="'Lucida Console"]second process block until the GPU is free? [/font]

[font="'Lucida Console"]

[/font]

[font="'Lucida Console"]I would like to hear from Nvidia engineers who can reference specific pages in Nvidia documentation that describes this phenomena.[/font]

[font="'Lucida Console"]

[/font]

[font="'Lucida Console"]thanks[/font]

[font="'Lucida Console"]

[/font]

[/b]

A quick check on the programming guide (streams, context, concurrent execution) should get all your questions answered. I do not have the guide with me (on iPod right now). I’ll just put what I remember here:
Each CPU process gets different context. Kernels from different contexts cannot execute in parallel for now. I think Nvidia is working to enable this.
A context itself occupies some memory space. Different contexts have different virtual address spaces. Meaning each context only has limited amount of global memory that it could use. Though the device does not seem to exercise a strict access control mechanism. You could even black out your screen when you misuse your pointers. So this perhaps would mean that, in one way or another, you do have access to the entirety of the global mem on your device for each context.

Anyway, you can check up the guide yourself to confirm things. I remember they are somewhere in sec 3.2.x.x