atomicAdd support Double Which card?

London · October 14, 2010, 1:20pm

Hi,

Which card model is going to support/supports atomicAdd working on double types?

Thaks

London · October 14, 2010, 1:20pm

Hi,

Which card model is going to support/supports atomicAdd working on double types?

Thaks

avidday · October 14, 2010, 2:01pm

None of them, as far as I can tell. Fermi adds 32 bit floating point support for AtomicAdd(), but that’s it.

avidday · October 14, 2010, 2:01pm

None of them, as far as I can tell. Fermi adds 32 bit floating point support for AtomicAdd(), but that’s it.

London · October 17, 2010, 9:02am

Does that mean we can only use float at maximum if we need to use atomic operations?

If the values we have is double with 6 decimal places, assign them to float variables, how much accuracy we will lose? If that is okey, then we can use the converted float variables with the atomic operations.

London · October 17, 2010, 9:02am

Does that mean we can only use float at maximum if we need to use atomic operations?

If the values we have is double with 6 decimal places, assign them to float variables, how much accuracy we will lose? If that is okey, then we can use the converted float variables with the atomic operations.

jack · October 17, 2010, 3:02pm

Yes, that means that you can only use single-precision floats with atomic operations.

However, single-precision float is accurate to ~7 decimal places, so in this case, you shouldn’t lose any precision.

jack · October 17, 2010, 3:02pm

Yes, that means that you can only use single-precision floats with atomic operations.

However, single-precision float is accurate to ~7 decimal places, so in this case, you shouldn’t lose any precision.

London · October 17, 2010, 7:49pm

Thanks for the reply. The card we have now is only support Integer with the Atomic operation. Does that mean when we assign the double value to a integer type variable, we will lose all the digits after the decimal points?

By the way, I am very interested in the GPU.Net. I will give a it try when the beta is released.

London · October 17, 2010, 7:49pm

Thanks for the reply. The card we have now is only support Integer with the Atomic operation. Does that mean when we assign the double value to a integer type variable, we will lose all the digits after the decimal points?

By the way, I am very interested in the GPU.Net. I will give a it try when the beta is released.

Magorath · October 18, 2010, 6:42am

You can write your own atomic function using the other ones:

[codebox] device inline void myAtomicAdd(double *address, double value) //See CUDA official forum

{

unsigned long long oldval, newval, readback;

oldval = __double_as_longlong(*address);

newval = __double_as_longlong(__longlong_as_double(oldval) + value);

while ((readback=atomicCAS((unsigned long long *)address, oldval, newval)) != oldval)

{

    oldval = readback;

    newval = __double_as_longlong(__longlong_as_double(oldval) + value);

}

}[/codebox]

Magorath · October 18, 2010, 6:42am

You can write your own atomic function using the other ones:

[codebox] device inline void myAtomicAdd(double *address, double value) //See CUDA official forum

{

unsigned long long oldval, newval, readback;

oldval = __double_as_longlong(*address);

newval = __double_as_longlong(__longlong_as_double(oldval) + value);

while ((readback=atomicCAS((unsigned long long *)address, oldval, newval)) != oldval)

{

    oldval = readback;

    newval = __double_as_longlong(__longlong_as_double(oldval) + value);

}

}[/codebox]

London · October 18, 2010, 8:50am

You can write your own atomic function using the other ones:

[codebox] device inline void myAtomicAdd(double *address, double value) //See CUDA official forum

{
unsigned long long oldval, newval, readback;
oldval = __double_as_longlong(*address);
newval = __double_as_longlong(__longlong_as_double(oldval) + value);

while ((readback=atomicCAS((unsigned long long *)address, oldval, newval)) != oldval)

{

    oldval = readback;

    newval = __double_as_longlong(__longlong_as_double(oldval) + value);

}
}[/codebox]

Hi Magorath,

Thanks for the information.

How is the performance of this custom atomicadd function? Which CUDA official forum(in the code comments) is it from?

Why NVIDIA does not build this function with the the current cards?

London · October 18, 2010, 8:50am

You can write your own atomic function using the other ones:

[codebox] device inline void myAtomicAdd(double *address, double value) //See CUDA official forum

{
unsigned long long oldval, newval, readback;
oldval = __double_as_longlong(*address);
newval = __double_as_longlong(__longlong_as_double(oldval) + value);

while ((readback=atomicCAS((unsigned long long *)address, oldval, newval)) != oldval)

{

    oldval = readback;

    newval = __double_as_longlong(__longlong_as_double(oldval) + value);

}
}[/codebox]

Hi Magorath,

Thanks for the information.

How is the performance of this custom atomicadd function? Which CUDA official forum(in the code comments) is it from?

Why NVIDIA does not build this function with the the current cards?

avidday · October 18, 2010, 10:47am

Poor. A thread might have to make many attempts before the CAS operation is successful. I haven’t ever fully been convinced that it is formally correct either.

Because it is very complex to do in hardware. Atomic operations are done in the memory controller and caches. Performing an atomic add of a 64 bit floating point number requires what is effectively a full 64 bit FPU into the memory controller. That is a lot of transistors for little real world application.

My gut feeling is that if you think you need 64 bit atomic floating point operations, you are probably using the wrong algorithmic approach.

avidday · October 18, 2010, 10:47am

Poor. A thread might have to make many attempts before the CAS operation is successful. I haven’t ever fully been convinced that it is formally correct either.

Because it is very complex to do in hardware. Atomic operations are done in the memory controller and caches. Performing an atomic add of a 64 bit floating point number requires what is effectively a full 64 bit FPU into the memory controller. That is a lot of transistors for little real world application.

My gut feeling is that if you think you need 64 bit atomic floating point operations, you are probably using the wrong algorithmic approach.

London · October 18, 2010, 11:06am

Hi avidday, thanks for your reply.

The data we marshalled from C# application are all in type double so the simulation results calculated by each GPU thread is of type double.

We need the atomicadd function so that we can add the results from each thread into the global memory. ( The recution algorithm is not suitable for our model).

What will be the best solution for the scenario? Do we need to convert all the double to float?

If Inter/Float atomic operation is our only choices now, when we do the conversion by assigment from double to float/Integer(so we can use atomic functions), will the truncation cause much problem?

London · October 18, 2010, 11:06am

Hi avidday, thanks for your reply.

The data we marshalled from C# application are all in type double so the simulation results calculated by each GPU thread is of type double.

We need the atomicadd function so that we can add the results from each thread into the global memory. ( The recution algorithm is not suitable for our model).

What will be the best solution for the scenario? Do we need to convert all the double to float?

If Inter/Float atomic operation is our only choices now, when we do the conversion by assigment from double to float/Integer(so we can use atomic functions), will the truncation cause much problem?

avidday · October 18, 2010, 11:28am

I am willing to be that you can (and probably should) be using a parallel reduction or prefix sum for this. You might believe that you need atomic operations, but my experience with what I expect are very comparable simulation applications tells me it is almost never the case.

avidday · October 18, 2010, 11:28am

I am willing to be that you can (and probably should) be using a parallel reduction or prefix sum for this. You might believe that you need atomic operations, but my experience with what I expect are very comparable simulation applications tells me it is almost never the case.

Topic		Replies	Views
multi dimension array CUDA Programming and Performance	26	32775	February 12, 2010
atomicadd for double precision in CUDA Fortran Legacy PGI Compilers	20	21604	November 15, 2013
Global thread barrier CUDA Programming and Performance	78	85672	December 23, 2011
Several threads attacking the same position. Superposition in that position. CUDA Programming and Performance	31	13621	October 19, 2010
Help with memory management CUDA Programming and Performance	20	5765	March 27, 2010
CUDA 4.1 suggested improvements. CUDA Programming and Performance	32	45445	October 8, 2011
What can't you do in CUDA that you'd like? Requests for the future CUDA Programming and Performance	407	134572	May 26, 2010
Is atomicExch() safe for incremental a global float array? CUDA Programming and Performance	13	18268	July 6, 2009
Some advice needed pls Doubts we have, we're starting with CUDA programming CUDA Programming and Performance	16	4698	June 22, 2011
[SOLVED] Code his own shared memory with device memory! CUDA Programming and Performance	15	2559	October 7, 2015

atomicAdd support Double Which card?

Related topics