Current best practices for controling global memory updates

tree28 · April 5, 2017, 8:14pm

Hello,

I am relatively new to CUDA and I hope this question isn’t inappropriate.

I am curious if an established best practice exists for limiting the times a value contained in global memory gets updated. For example, if a global variable is initially set to zero one might imagine a kernel method that reduces the number of times an update of a global variable takes place to look something like…

if(global_var[i] == 0 && new_value != 0)
{
global_var[i] = new_value;
}

And if that this flow control was hit many times that one could use a shared memory variable to help reduce the likelihood of global values being read or written to many many times…

if(i < n)
{
shared_copy[tx] = global_var[i];
}

… then later use the shared value for flow control…

if(shared_copy[tx] == 0 && new_value != 0)
{
global_var[i] = new_value;
}

My question is: after testing code similar to this it is clear that this flow control method doesn’t work - but do other methods exist?

bha4395 · April 5, 2017, 8:57pm

Your question is slightly confusing.

Are you asking whether a bunch of warps are addressing the same global variable at the same time run into a conflict and how to resolve that?

Or are you asking how to reduce the number of times you change a global variable?

That being said, a kernel should only ever interact with a global variable/vector once. Either when loading that data into the shared space, or moving it from the shared space into global. (Assuming you have the shared space to hold the global data.)

If you are talking about addressing the same global variable over the span of multiple warps you should look into atomic operations so that you don’t run into any data race conditions.

Good luck,

BHa

*As a note, I mainly code for OpenCL but have a general knowledge of CUDA and the similarities between the two. If I said something incorrectly regarding CUDA please let me know and I’ll alter my post for that.

tree28 · April 5, 2017, 9:20pm

BHa,

Appreciate your asking for clarification: I’m trying to reduce the number of times a global variable is changed.

Here is some context → A method must check many different cases across n threads. If one of case is true then the case will be true regardless of any false instance.

I understand that one solution to the method could be to simply add either 1 or 0 to the global variable without flow control (adding zero (or false) to 1 (or true) will still result in 1 (true)) and that the addition could happen n times but it seems like a waste to perform needless writes if true were found prior to the end of n checks - thus the question.

I have similar versions of this problem logic in other methods doing different work.

Does that make sense?

bha4395 · April 6, 2017, 1:53pm

Okay I think that makes more sense. I think in this case, race conditions might not really matter to you.

Overall, I think that your approach should be that in every case that a thread finds a true case, it should change the global variable to true. Even with race conditions it will remain true.

That being said, conditionals (ifs/elseif/elses) inside of GPU kernels aren’t great, especially when you have branching (when threads go to different portions of code) and can cause slow downs. AFAIK, most kernel compilers compile assuming that the if statement will return true and therefore it is generally taught that you should write conditionals to return true more often than not.

Therefore, you could set up an additional if statement checking whether the global has been altered yet, but be aware that this causes added branching.

Another option is where you could do something similar to a reduction sum. You find the value of each of the cases across the threads and then using a reduction sum, add the 1s and 0s. Afterwards you apply that to the global variable and you should get a result that is either 0 or positive giving you an answer. This might end up being more computationally involved though.

tree28 · April 12, 2017, 6:30pm

Thanks!

Topic		Replies	Views
writing to the same global variable by different threads CUDA Programming and Performance	4	4572	December 9, 2009
Many threads updating a single flag in global memory CUDA Programming and Performance	13	6724	May 9, 2011
update a global variable in child kernel CUDA Programming and Performance	6	1806	December 16, 2018
Global variables not being updated? CUDA Programming and Performance	0	500	April 6, 2011
Variable Initialisation on Device Routine CUDA Programming and Performance	4	2588	May 24, 2008
write to global memory from multiple threads and racing conditions CUDA Programming and Performance	3	3350	April 26, 2009
How to set the variables in the global memory to zero effectively? initialize global memory CUDA Programming and Performance	5	3913	March 24, 2009
Making changes to Global Memory visible CUDA Programming and Performance	3	1904	May 24, 2009
Many threads updating a single global variable CUDA Programming and Performance	7	6935	March 30, 2012
Concurrent writing to a global variable CUDA Programming and Performance	10	2483	December 7, 2013

Current best practices for controling global memory updates

Related topics