atomicMin on Char? Is there a way to compare char to in to use atomicMin?

tojoshua · May 1, 2011, 11:59pm

I am throwing this out there for any ideas about how to do the following:

I have device data that is a 1D array of char.
In my kernel I need to update this data with the minimum value of what is in the array and the value that my kernel computes.
According to the documentation, the atomicMin() operation works on ints.
Is there a way to implement the atomicMin on my char data without converting the data to ints. I need to avoid using ints as the data to conserve memory.

Any help you can offer would be greatly appreciated as I am stumped right now…

Grateful,
Joshua

LSChien · May 2, 2011, 2:16am

a thread-block updates a critical section.
For example, one thread-block compute atomicMin of 256 characters.
you can update these 256 values by atomicMin on shared memory.
After that, write result (cast int to character) to global memory via semaphore.

tera · May 2, 2011, 12:36pm

You can create any atomic function (on data types of at most 32 bits) from the atomicCAS() function:

__device__ char atomicMinChar(char* address, char val)

{

    unsigned int *base_address = (unsigned int *)((size_t)address & ~3);

    unsigned int selectors[] = {0x3214, 0x3240, 0x3410, 0x4210};

    unsigned int sel = selectors[(size_t)address & 3];

    unsigned int old, assumed, min_, new_;

old = *base_address;

    do {

        assumed = old;

        min_ = min(val, (char)__byte_perm(old, 0, ((size_t)address & 3) | 0x4440));

        new_ = __byte_perm(old, min_, sel);

        if (new_ == old)

            break;

        old = atomicCAS(base_address, assumed, new_);

    } while (assumed != old);

    return old;

}

tojoshua · May 2, 2011, 1:53pm

You can create any atomic function (on data types of at most 32 bits) from the atomicCAS() function:

__device__ char atomicMinChar(char* address, char val)

{

    unsigned int *base_address = (unsigned int *)((size_t)address & ~3);

    unsigned int selectors[] = {0x3214, 0x3240, 0x3410, 0x4210};

    unsigned int sel = selectors[(size_t)address & 3];

    unsigned int old, assumed, min_, new_;

old = *base_address;

    do {

        assumed = old;

        min_ = min(val, (char)__byte_perm(old, 0, ((size_t)address & 3)));

        new_ = __byte_perm(old, min_, sel);

        old = atomicCAS(base_address, assumed, new_);

    } while (assumed != old);

    return old;

}

Tera,

Thank you so much for the code snippet. This is a little advanced for me... although I know what you are doing would you be willing to comment the lines for me so I can fully understand?

Truly Grateful,

Joshua

tera · May 2, 2011, 10:54pm

Sure. I’ve also improved the code a little bit after looking at it again.

I started from the example code from appendix B.11 of the Programming Guide for double precision atomic add, which loops until atomicCAS() is able to set the desired value. Then it just needs a bit of byte juggling, for which I use the [font=“Courier New”]__byte_perm()[/font] intrinsic that is described in appendix C.2.3 of the Programming guide.

[font=“Courier New”]address & ~3[/font] rounds the address downwards to the previous multiple of four to find the word where the char value is located. The cast to [font=“Courier New”]size_t[/font] enables us to use bit operations on the pointer.

[font=“Courier New”]0x3210[/font] is the selector that makes [font=“Courier New”]__byte_perm()[/font] just return its first argument. We substitute a 4 at the place where the new byte value is to be inserted. The place where the new byte should end up is determined by the lowest to bits of the original byte address.

For calculating [font=“Courier New”]min_[/font] (I’ve added an underscore because [font=“Courier New”]min[/font] is already taken by the intrinsic function of the same name) we use another [font=“Courier New”]__byte_perm()[/font] to extract the previous value of the char. When I edited the code, I set the higher nibbles of the selector to 4 so that the higher bytes of the resulting value will explicitly get set to the second argument, i.e. zero, although in Nvidia’s current implementation the cast to [font=“Courier New”]char[/font] is already sufficient to ensure that.

To compute the desired word [font=“Courier New”]new_[/font] (again with an underscore as [font=“Courier New”]new[/font] already is a C++ keyword) we juggle the [font=“Courier New”]min_[/font] value into place using the selector already obtained.

Unlike the [font=“Courier New”]atomicAdd()[/font] example, it is quite likely that the desired new value is the same value already in place, so in the edited code I added an optimization to skip the atomic operation in that case. (One could have also just branched out of the loop if the previous char value already was larger or equal than the provided one, implementing a maximum function by hand. But as the intrinsic [font=“Courier New”]max()[/font] function compiles to a single instruction, I left the code as it is as I find it more descriptive).

[font=“Courier New”]atomicCAS()[/font] will then write the new word to memory, provided no external change to the memory location has occurred in the meantime that might render our calculation invalid. If such an external change is detected, we start all over.

tojoshua · May 11, 2011, 9:12pm

tera,

Thanks so much for your help!! The code you sent is beautiful and works beautifully! Thank you for sharing hard earned knowledge with me.

Grateful,

Joshua

Sure. I’ve also improved the code a little bit after looking at it again.

I started from the example code from appendix B.11 of the Programming Guide for double precision atomic add, which loops until atomicCAS() is able to set the desired value. Then it just needs a bit of byte juggling, for which I use the [font=“Courier New”]__byte_perm()[/font] intrinsic that is described in appendix C.2.3 of the Programming guide.

[font=“Courier New”]address & ~3[/font] rounds the address downwards to the previous multiple of four to find the word where the char value is located. The cast to [font=“Courier New”]size_t[/font] enables us to use bit operations on the pointer.

[font=“Courier New”]0x3210[/font] is the selector that makes [font=“Courier New”]__byte_perm()[/font] just return its first argument. We substitute a 4 at the place where the new byte value is to be inserted. The place where the new byte should end up is determined by the lowest to bits of the original byte address.

For calculating [font=“Courier New”]min_[/font] (I’ve added an underscore because [font=“Courier New”]min[/font] is already taken by the intrinsic function of the same name) we use another [font=“Courier New”]__byte_perm()[/font] to extract the previous value of the char. When I edited the code, I set the higher nibbles of the selector to 4 so that the higher bytes of the resulting value will explicitly get set to the second argument, i.e. zero, although in Nvidia’s current implementation the cast to [font=“Courier New”]char[/font] is already sufficient to ensure that.

To compute the desired word [font=“Courier New”]new_[/font] (again with an underscore as [font=“Courier New”]new[/font] already is a C++ keyword) we juggle the [font=“Courier New”]min_[/font] value into place using the selector already obtained.

Unlike the [font=“Courier New”]atomicAdd()[/font] example, it is quite likely that the desired new value is the same value already in place, so in the edited code I added an optimization to skip the atomic operation in that case. (One could have also just branched out of the loop if the previous char value already was larger or equal than the provided one, implementing a maximum function by hand. But as the intrinsic [font=“Courier New”]max()[/font] function compiles to a single instruction, I left the code as it is as I find it more descriptive).

[font=“Courier New”]atomicCAS()[/font] will then write the new word to memory, provided no external change to the memory location has occurred in the meantime that might render our calculation invalid. If such an external change is detected, we start all over.

Topic		Replies	Views
atomicMin with float CUDA Programming and Performance	8	20602	April 16, 2019
function atomicMin CUDA Setup and Installation	0	776	April 11, 2014
atomic min/max for real data Legacy PGI Compilers	8	20092	February 18, 2010
how to find the min value CUDA Programming and Performance	2	3882	October 25, 2008
Creating custom atomic CUDA functions CUDA Programming and Performance	2	1407	October 17, 2020
Unsupported operation while compiling atomicMin CUDA Programming and Performance	2	690	November 20, 2018
Floating atomicMin() help!!! CUDA Programming and Performance	6	1494	May 13, 2014
Hwo to use atomic_min in warp Isaac Sim cuda , isaac-sim-v4-5-0	4	117	May 15, 2025
How to use atomicCAS() to implement atomicAdd(short)? Trouble adapting programming guide example CUDA Programming and Performance	27	15471	September 6, 2018
Some atomic functions are missing? CUDA Programming and Performance	1	643	December 30, 2019

atomicMin on Char? Is there a way to compare char to in to use atomicMin?

Related topics