global variable in CUDA

shivi · January 26, 2010, 9:31am

Hello All,

I am new to CUDA and to programming as well. Please help me.

I want to define a global variable count and then use it in different CUDA device functions. So i do something like this.

device int count; // Declare global variable here

main() {
do {
core_kernel<<<5, 32>>>(some parameters); //Call the CUDA kernel
//// What is the value in count variable here???

} while (count != 16);

}

global void core_kernel( some parameters){

calldevicefunction(); // global function calls the device function
}

device void calldevicefunction ( ) {

count = count +1; // use the global variable directly here.

}

I assume device functions is executed by several threads, after the core_kernel is called, how many times function increments the value in count variable. Does the value of count after single kernel call is incremented just once or by the number of total threads/blocks??

Also, is it okay to use count variable this way in calldevicefuntion without declaring or passing it as a function parameter. Do you see any conflict this way by several threads trying to work on the same variable.

Thanks for your help.

cudapath · January 27, 2011, 7:19pm

Hello All,

I am new to CUDA and to programming as well. Please help me.

I want to define a global variable count and then use it in different CUDA device functions. So i do something like this.

<b>__device__ int count;</b>  // Declare global variable here

main() {

do {

  core_kernel<<<5, 32>>>(some parameters);  //Call the CUDA kernel

  //// What is the value in count variable here???

  } while (count != 16);

}

__global__ void core_kernel( some parameters){

  calldevicefunction(); // global function calls the device function

}

__device__ void  calldevicefunction ( ) {

  <b>count = count +1;</b> // use the global variable directly here.

}

I’m not sure where to begin.

Before the kernel call, count will be some random value - like in C/C++, you’ve declared a variable but not assigned anything to it. In this case, it will contain whatever random junk is in the device global memory address in which “count” is stored.

After the kernel call (assuming you don’t error out), count will be 16, since your code will loop until it is 16, but it may be a very, very long wait before it is equal to 16 and count may have been incremented literally trillions of times. Each thread will attempt to add 1 to the value of count. Each thread may or may not add 1 to the value of count it saw during the time it read count and the time it added 1 to it (in count = count + 1, the sequence of events within one thread can be approximately modeled as read the value of count [ count = count + 1 ], increment that value by 1 [ count = count + 1 ], store that value back to count [ count = count + 1 ]. Unfortunately, after any one of those steps count could have been changed by another thread). Lastly, count is not checked for 16 until after presumably thousands of threads have added to it.

I’m therefore not at all sure what you’re trying to do here, but I suspect you’re heading down the wrong path. Perhaps you could explain.

Note, even if you set count=0 before calling calldevicefunction(), AND you only had one block of say 2 threads ( which would be useless, but would have a half-decent chance of reaching 16 in 8 loops), the count = count + 1 (or even count += 1) is not atomic, so you’re going to get strange behavior. See atomicAdd() if you really want to do something like count the number of times an operation has been performed by any thread on the device; however, it appears that you are trying to do in CUDA what you might do in a single-threaded environment like traditional C/C++. May I suggest running through a tutorial on CUDA and/or following an on-line lecture series (Standford and the University of Illinois both have great ones).

Topic		Replies	Views
Increment a device variable each time a kernel function gets executed. CUDA Programming and Performance	3	2137	April 6, 2012
update a global variable in child kernel CUDA Programming and Performance	6	1819	December 16, 2018
Variable global CUDA Programming and Performance	17	5084	January 21, 2012
Question about the scope of global __device__ variable CUDA Programming and Performance	2	1427	March 1, 2010
How to use multithread to accumulate one variable CUDA Programming and Performance	3	2291	October 24, 2008
What if call a function in a __globall__ function CUDA Programming and Performance	1	1504	June 19, 2007
memory operation inside kernel simultaneous using of variable inside the kernel function CUDA Programming and Performance	2	3145	June 11, 2011
Strange behavior with global __device__ variable CUDA Programming and Performance	2	2197	July 5, 2009
Writing/reading to shared global address CUDA Programming and Performance	2	1107	November 18, 2008
how to use global device struct variables in device functions CUDA Programming and Performance	4	9311	May 19, 2011

global variable in CUDA

Related topics