AtomicXor...How do does it work?

Hi all,

I have read the description of the AtomicXor function however im not quite sure on the parameters that the function requires? Could someone explain how it works or point me to an example of it in use please? That would be great.

All i am trying to do is Xor two unsigned chars together, however I do understand the the function works only with integers, but this is not a problem for the application in which im using the function.



Well, if you can atomicXor an integer, you can also do a character by using shifts (atomic on 4-byte level is atomic on 1-byte level by definition)

The parameters and usage of the atomic functions is described in the SDK documentation. Basically you pass an address and a value, in that order, that’s all :)

I had read the SDK documentation however im still not understanding the address and the the value parameters?

For example if i was trying to do this for example

int i, x, y;

i = x ^ y;

How would that work with the atmoic function?

i would need to be in device memory (initialized to x) and you would pass the &i in the address parameter and y in the value parameter.

Are you sure you need the atomic xor? The atomic ops are only needed if you have many threads all trying to xor the value at the same time.

Well to be honest im not sure i need it now. Can i use the regular bitwise operators as usual if i only want to use the value once then?

Yep, you can use all regular operations on integers just declared as “int i” etc… These operations are then performed in each thread individually. For that matter, you can do practically anything C syntax allows to any variable declared like that: CUDA is a full fledged compiler. On the device you are really only limited by resources, and you of course can’t make calls to the standard library (or others). Math library functions are limited to those listed in the CUDA programming manual (which includes most everything you can think of: sin/cos/exp, etc…).

Atomic operations are for when you have multiple threads trying to modify the same integer and you need to read, modify, then write all in one atomic (unbreakable) operation. It’s usually best to avoid such situations at all possible, since they incur a large performance penalty.

Ahhh, ok thanks very much Mister! That makes sense :thumbup:

but im trying to Xor elements in two different arrays in a device function. If i do this int the global kernel function this seems to work fine. However if i call a subsequent device function and perform the same Xor, it appears to have no effect? whys that?

eg working:

device int array1[16];

device int array2[16];

global void functionA(){

for (int i =0; i<=15; i++){

        array1[i] = array1[i]^array2[i];



eg not working:

device int array1[16];

device int array2[16];

device void functionB(int array1, int array2){

int temp [16];

for (int l = 0; l<= 15; l++){

	temp[l] = array1[l];


for (int k = 0; k<=15; k++){

	array1[k] = temp[k]^array2[k];



global void functionA(){