Interactions among blocks

From CUDA Programming Guide 1.1, page 29:

So my question is (assuming a hosted system running multiple virtual machines):

a. Is it possible that each block is started by totally different virtual machines?

b. If it is then Is there a possibility of two blocks communicating with one another? If yes, then these different blocks can exchange information outside the control of the OS?

Different apps don’t run on the GPU at the same time.

not sure what you mean.

but looking at CUDA programming guide 2.0, page 32, is mentioned “asynchronous concurrent execution”, whereby execution on the device can return back to the host even before it is completed…then when it returned, what happen if another application submit a new job to the device…can it execute before finishing the previous threads of execution?

if yes, then it may be possible that the 2nd job is submitted by a different applications as compared with the first one?

If this is wrong, can someone provide some references indicating otherwise? Help is greatly appreciated.

No, it will be queued up and executed after the current job is done. The “asynchronous concurrent execution” part in the Programming Guide is about the host thread continuing to work on the CPU (or go to sleep) rather than actively waiting for the GPU to finish.

ah…I see…so how about this - is there any possibilities that the 2nd job can see the data generated by the first job - if there is no memory cleanup at the end of the first job? does the nvcc compiler always generate cleanup codes to be appended to the main program?

The compiler does not create cleanup code …

May I ask you a question; are you trying to create an exploit or defend against one? (The latter being much more trivial than the former.)

I am trying to understand the GPU from a security standpoint of view. The GPU is accessible via libraries (running at userspace level). So multiple processes can concurrently be accessing the GPU at the same time. If so then it is possible that data generated by one thread is visible to another thread? I am quite sure such a simplistic understanding is totally wrong…please enlighten me :-).

A good starting point would then be to create a program that writes a bitpattern to GPU memory, and the another that reads it back.

The NCSA CUDA wrapper scrubs the GPU memory after each job finishes, so it seems you can potentially see data from previous executions:


This is interesting. All OS context switching codes always cleanup all the registers, and FPUs (MMX, SSE etc) registers before passing over execution to another task. So now they have the additional workload of cleaning up the all the GPUs/memory/registers? (as the GPU’s memory is not subjected to the normal page table protection mechanism (MMU) of the CPU) Again…don’t sound very plausible either…any comment on that?

The GPU context switches on its own.

I see…now I understand better. Thank you to all for the answer.