atomicMin on Char? Is there a way to compare char to in to use atomicMin?

I am throwing this out there for any ideas about how to do the following:

  • I have device data that is a 1D array of char.
  • In my kernel I need to update this data with the minimum value of what is in the array and the value that my kernel computes.
  • According to the documentation, the atomicMin() operation works on ints.
  • Is there a way to implement the atomicMin on my char data without converting the data to ints. I need to avoid using ints as the data to conserve memory.

Any help you can offer would be greatly appreciated as I am stumped right now…

Grateful,
Joshua

a thread-block updates a critical section.
For example, one thread-block compute atomicMin of 256 characters.
you can update these 256 values by atomicMin on shared memory.
After that, write result (cast int to character) to global memory via semaphore.

You can create any atomic function (on data types of at most 32 bits) from the atomicCAS() function:

__device__ char atomicMinChar(char* address, char val)

{

    unsigned int *base_address = (unsigned int *)((size_t)address & ~3);

    unsigned int selectors[] = {0x3214, 0x3240, 0x3410, 0x4210};

    unsigned int sel = selectors[(size_t)address & 3];

    unsigned int old, assumed, min_, new_;

old = *base_address;

    do {

        assumed = old;

        min_ = min(val, (char)__byte_perm(old, 0, ((size_t)address & 3) | 0x4440));

        new_ = __byte_perm(old, min_, sel);

        if (new_ == old)

            break;

        old = atomicCAS(base_address, assumed, new_);

    } while (assumed != old);

    return old;

}

Tera,

Thank you so much for the code snippet. This is a little advanced for me... although I know what you are doing would you be willing to comment the lines for me so I can fully understand? 

Truly Grateful,

Joshua

Sure. I’ve also improved the code a little bit after looking at it again.

I started from the example code from appendix B.11 of the Programming Guide for double precision atomic add, which loops until atomicCAS() is able to set the desired value. Then it just needs a bit of byte juggling, for which I use the [font=“Courier New”]__byte_perm()[/font] intrinsic that is described in appendix C.2.3 of the Programming guide.

[font=“Courier New”]address & ~3[/font] rounds the address downwards to the previous multiple of four to find the word where the char value is located. The cast to [font=“Courier New”]size_t[/font] enables us to use bit operations on the pointer.

[font=“Courier New”]0x3210[/font] is the selector that makes [font=“Courier New”]__byte_perm()[/font] just return its first argument. We substitute a 4 at the place where the new byte value is to be inserted. The place where the new byte should end up is determined by the lowest to bits of the original byte address.

For calculating [font=“Courier New”]min_[/font] (I’ve added an underscore because [font=“Courier New”]min[/font] is already taken by the intrinsic function of the same name) we use another [font=“Courier New”]__byte_perm()[/font] to extract the previous value of the char. When I edited the code, I set the higher nibbles of the selector to 4 so that the higher bytes of the resulting value will explicitly get set to the second argument, i.e. zero, although in Nvidia’s current implementation the cast to [font=“Courier New”]char[/font] is already sufficient to ensure that.

To compute the desired word [font=“Courier New”]new_[/font] (again with an underscore as [font=“Courier New”]new[/font] already is a C++ keyword) we juggle the [font=“Courier New”]min_[/font] value into place using the selector already obtained.

Unlike the [font=“Courier New”]atomicAdd()[/font] example, it is quite likely that the desired new value is the same value already in place, so in the edited code I added an optimization to skip the atomic operation in that case. (One could have also just branched out of the loop if the previous char value already was larger or equal than the provided one, implementing a maximum function by hand. But as the intrinsic [font=“Courier New”]max()[/font] function compiles to a single instruction, I left the code as it is as I find it more descriptive).

[font=“Courier New”]atomicCAS()[/font] will then write the new word to memory, provided no external change to the memory location has occurred in the meantime that might render our calculation invalid. If such an external change is detected, we start all over.

tera,

Thanks so much for your help!! The code you sent is beautiful and works beautifully! Thank you for sharing hard earned knowledge with me.

Grateful,

Joshua