Interactions among blocks

tthtlc · February 5, 2010, 7:01am

From CUDA Programming Guide 1.1, page 29:

The way a block is split into warps is always the same; each warp contains threads of

consecutive, increasing thread IDs with the first warp containing thread 0.

Section 2.2.1 describes how thread IDs relate to thread indices in the block.

The issue order of the warps within a block is undefined, but their execution can be

synchronized, as mentioned in Section 2.2.1, to coordinate global or shared memory

accesses.

The issue order of the blocks within a grid of thread blocks is undefined and there is

no synchronization mechanism between blocks, so threads from two different

blocks of the same grid cannot safely communicate with each other through global

memory during the execution of the grid.

If a non-atomic instruction executed by a warp writes to the same location in global

or shared memory for more than one of the threads of the warp, the number of

serialized writes that occur to that location and the order in which they occur is

undefined, but one of the writes is guaranteed to succeed. If an atomic instruction

(see Section 4.4.6) executed by a warp reads, modifies, and writes to the same

location in global memory for more than one of the threads of the warp, each read,

modify, write to that location occurs and they are all serialized, but the order in

which they occur is undefined.

So my question is (assuming a hosted system running multiple virtual machines):

a. Is it possible that each block is started by totally different virtual machines?

b. If it is then Is there a possibility of two blocks communicating with one another? If yes, then these different blocks can exchange information outside the control of the OS?

tmurray · February 5, 2010, 7:21am

Different apps don’t run on the GPU at the same time.

tthtlc · February 5, 2010, 7:43am

not sure what you mean.

but looking at CUDA programming guide 2.0, page 32, is mentioned “asynchronous concurrent execution”, whereby execution on the device can return back to the host even before it is completed…then when it returned, what happen if another application submit a new job to the device…can it execute before finishing the previous threads of execution?

if yes, then it may be possible that the 2nd job is submitted by a different applications as compared with the first one?

If this is wrong, can someone provide some references indicating otherwise? Help is greatly appreciated.

jma · February 5, 2010, 8:03am

No, it will be queued up and executed after the current job is done. The “asynchronous concurrent execution” part in the Programming Guide is about the host thread continuing to work on the CPU (or go to sleep) rather than actively waiting for the GPU to finish.

tthtlc · February 5, 2010, 8:38am

ah…I see…so how about this - is there any possibilities that the 2nd job can see the data generated by the first job - if there is no memory cleanup at the end of the first job? does the nvcc compiler always generate cleanup codes to be appended to the main program?

jma · February 5, 2010, 8:55am

The compiler does not create cleanup code …

May I ask you a question; are you trying to create an exploit or defend against one? (The latter being much more trivial than the former.)

tthtlc · February 5, 2010, 9:15am

I am trying to understand the GPU from a security standpoint of view. The GPU is accessible via libraries (running at userspace level). So multiple processes can concurrently be accessing the GPU at the same time. If so then it is possible that data generated by one thread is visible to another thread? I am quite sure such a simplistic understanding is totally wrong…please enlighten me :-).

jma · February 5, 2010, 11:42am

A good starting point would then be to create a program that writes a bitpattern to GPU memory, and the another that reads it back.

seibert · February 5, 2010, 10:19pm

The NCSA CUDA wrapper scrubs the GPU memory after each job finishes, so it seems you can potentially see data from previous executions:

[url=“http://www.ncsa.illinois.edu/AboutUs/Directorates/ISL/software.html”]http://www.ncsa.illinois.edu/AboutUs/Direc...L/software.html[/url]

tthtlc · February 5, 2010, 11:41pm

This is interesting. All OS context switching codes always cleanup all the registers, and FPUs (MMX, SSE etc) registers before passing over execution to another task. So now they have the additional workload of cleaning up the all the GPUs/memory/registers? (as the GPU’s memory is not subjected to the normal page table protection mechanism (MMU) of the CPU) Again…don’t sound very plausible either…any comment on that?

tmurray · February 5, 2010, 11:55pm

The GPU context switches on its own.

tthtlc · February 6, 2010, 8:24am

I see…now I understand better. Thank you to all for the answer.

Topic		Replies	Views
Scheduling Blocks on a Multi-Processor Block Scheduling on Multiprocessor CUDA Programming and Performance	11	6393	December 6, 2007
CUDA Memory Consistency CUDA Programming and Performance	23	55543	March 8, 2007
Each thread working concurrently ? CUDA Programming and Performance	5	1118	March 2, 2010
questions about the NVIDIA programming model and GPU architecture newbie in here.... CUDA Programming and Performance	3	2470	November 10, 2008
some doubts about the task scheduling of NVIDIA GPU CUDA Programming and Performance	6	2177	May 26, 2017
How do CUDA cores on a SM execute warps concurrently? CUDA Programming and Performance	8	28704	July 4, 2019
General CUDA Questions New to CUDA and need some help! CUDA Programming and Performance	8	5980	September 5, 2008
thread, warp, block, grid, device CUDA Programming and Performance	3	6434	November 25, 2016
Atomic Operations in CUDA CUDA Programming and Performance	5	29236	June 9, 2009
Blocks and Warps CUDA Programming and Performance	2	756	July 29, 2011

Interactions among blocks

Related topics