question in cuda Variable Type Qualifiers

how can i declare that my var. is on device memory ? and where ?

if i has definied variable in global or device function , where that variable will be ?
example:
global void test()
{
int x;

}

is x will be in the device memory ? , if yes , so why should i use device ?

The compiler will put variables declared in global or device functions into registers unless (1) the number of registers gets too big or (2) the variable is an array that you access with an index. Variables that don’t fit into registers for one of these two reasons are “spilled” to what the guide calls “local memory”, which is really just device memory, but organized so that every thread gets its own private piece.

Variables declared locally to functions will be put into registers. You can declare a global variable “device int x” outside of the kernel. To initialize such variables from the host, you need to use cudaMemcpyToSymbol.

Am I right that something like this:

device int nStopWorking;

global void init_array(int *g_data, int *factor)

{

if (!nStopWorking)

{

....

}

}

main()

{

int nTemp = 1;

cudaMemcpyToSymbol(&nStopWorking, &nTemp, sizeof(int));

init_array<<<…>>>

}

should prevent the kernel from doing anything ?

Correct. Although I’m not certain on the syntax of cudaMemcpyToSymbol in this context. I just remember a post a week or two ago where someone said that this works for device variables.

May be you saw that I have posted a question about kernel termination …

How do you think, is it possible in theory to terminate a kernel using a flag variable in the device memory ?

I mean something like this:

  1. declare int* pStopRunning in the device memory, cudaMalloc it.

  2. run the kernel (kernel checks *pStopRunning during runtime)

  3. cudaMemcpy(pStopRunning, 1) (syntax is conventional)

As I think, after step (3) the pStopRunning contents will change and kernel will stop running.

However, this is what I observe in practice:

  1. If to write “1” to pStopRunning BEFORE the kernel run - the kernel will do nothing and finish immedialtely.

  2. If to wrtite “1” to pStopRunning AFTER the kernel run - the kernel won’t stop running until it’s normal end.

I’ve tried a number of variants with the same result - once started, kernel does not change it’s behavior. Arghh … how to stop that thing ??

cudaMemcpy will wait for all previous kernel calls to complete before writing memory on the GPU, so this is expected behavior. If one of your blocks detects an exit condition, it could set the device variable to prevent future blocks from starting up, thus causing an early exit.

I don’t have any suggestions on stopping runaway kernels from the host, except to say that in my experience on linux, even infinite loop kernels are killed via Cntrl-C on a machine with no display.

Unfortunately, WinXP hangs up until kernel termination …

Thank you for your answer very much.

What about “cheating” and using an async memcpy?

Not sure - did not try yet …