how to calculate the double precision data by rtx(or gtx) GPU?

742820157 · April 5, 2019, 2:25am

the rtx GPU’double precision performance may be very weak(such as gtx 960?rtx 2080ti?),I want to improve the double precision performance in my gtx gpu(I only have the gtx gpu) ,how to solve the problem?
or can I calculate double precision data by using some single precision operations?(without lossing precision),thank you!

njuffa · April 5, 2019, 6:19am

Double-precision computation requires double-precision arithmetic units in the GPU. There is only a small number of those in a consumer GPU, and you can’t change that number. There are no hidden units you could magically unlock with some trick. Typically the double-precision units of high-end consumer GPUs provide several hundred GFLOPS of throughput.

There are techniques you could try that use pairs of ‘float’ numbers to simulate double precision (after a fashion), providing almost the same precision, but limiting you to the same limited range as ‘float’. See this old answer of mine on Stack Overflow for references: [url]https://stackoverflow.com/a/6770329/780717[/url]

There might be a ready-to-use software library somewhere that implements this double-float floating-point format but I am not aware of one at this time.

742820157 · April 6, 2019, 10:38am

Double-precision computation requires double-precision arithmetic units in the GPU. There is only a small number of those in a consumer GPU, and you can’t change that number. There are no hidden units you could magically unlock with some trick. Typically the double-precision units of high-end consumer GPUs provide several hundred GFLOPS of throughput.

There are techniques you could try that use pairs of ‘float’ numbers to simulate double precision (after a fashion), providing almost the same precision, but limiting you to the same limited range as ‘float’. See this old answer of mine on Stack Overflow for references: https://stackoverflow.com/a/6770329/780717

There might be a ready-to-use software library somewhere that implements this double-float floating-point format but I am not aware of one at this time.

I may not express clearly above.
I want to calculate the funtion i my gpu(this is just a example)(kernel),and the rtx GPU’double precision performance may be very weak,I want to speed up the calculation,can I do something?(can I calculate double precision data by using some single precision operations)
ps:the parameters is double precision.

double test(double a,double b){
    return a*a+b*b+a*b;
}

Robert_Crovella · April 6, 2019, 2:19pm

No, there isn’t any simple way to do that.

You either get the throughput available for double precision on your GPU, or you have to use some kind of library or set of other functions to try and break the double precision calculations into single precision, and there is no standard or simple way to do that, and in practice almost nobody does that.

What is actually pretty common, however, is for people to figure how to make their algorithm work with just single precision.

float test(float a,float b){
    return a*a+b*b+a*b;
}

njuffa · April 6, 2019, 2:23pm

You cannot do that faster than the native double-precision functional units in the GPU if you want the exact same results.

You can likely get approximately the same results using computation on paired-float operands, see second paragraph of my previous post. And that may be faster. Note that your computation will have to be substantially more complex than what you show in your example to reap performance benefits, since you will have conversion overhead (from/to ‘double’) at the start and end of the computation which creates additional overhead.

As Robert Crovella points out, in various use cases there are ways to use single-precision computation instead of double precision, or single-precision computation for the bulk of the processing followed by double-precision refinement/cleanup (so called “mixed-precision computation”, you might want to Google for it).

742820157 · April 7, 2019, 3:02am

Robert_Crovella:

No, there isn’t any simple way to do that.

You either get the throughput available for double precision on your GPU, or you have to use some kind of library or set of other functions to try and break the double precision calculations into single precision, and there is no standard or simple way to do that, and in practice almost nobody does that.

What is actually pretty common, however, is for people to figure how to make their algorithm work with just single precision.
float test(float a,float b){
    return a*a+b*b+a*b;
}

thank you!

742820157 · April 7, 2019, 3:03am

can I calculate double precision data by using some single precision operations

You cannot do that faster than the native double-precision functional units in the GPU if you want the exact same results.

You can likely get approximately the same results using computation on paired-float operands, see second paragraph of my previous post. And that may be faster. Note that your computation will have to be substantially more complex than what you show in your example to reap performance benefits, since you will have conversion overhead (from/to ‘double’) at the start and end of the computation which creates additional overhead.

As Robert Crovella points out, in various use cases there are ways to use single-precision computation instead of double precision, or single-precision computation for the bulk of the processing followed by double-precision refinement/cleanup (so called “mixed-precision computation”, you might want to Google for it).

thank you!

saulocpp · April 10, 2019, 10:56am

Sorry to hijack the topic, but when I am on NVVP checking functions, in the shared memory report it will often suggest to use double precision to achieve twice the bandwidth.

I don’t know how much it relates to what the OP discussed, but maybe someone can enlighten me on how it works exactly or if it actually happens? Because if I don’t need the precision and float works fine result-wise, it will double the bandwidth (???) at the cost of using twice as much SHMEM, potentially limiting the number of blocks that could be active at a time.

Robert_Crovella · April 10, 2019, 12:22pm

This is referring to a kepler-specific feature. If you set 8-byte mode in Kepler, the achievable bandwidth to shared memory doubles. If you search around on kepler shared eight-byte-mode you’ll find various writeups.

saulocpp · April 10, 2019, 1:01pm

Yes, so this feature I remember. However, the command that sets to 8-byte has no effect on Maxwell+, if I am not mistaken.
Does it mean that for Maxwell onwards it is already doing something under the hood and providing the benefit?

ryork · April 10, 2019, 10:37pm

I recommend reading the white paper in the CUDA documentation called “Floating Point and IEEE 754” and also Goldberg’s paper which is reference number 5 in the white paper. Goldberg’s paper lists a few algorithms that can be used to retain accuracy and not have values rounded away. The papers also discuss the fused multiply add instruction that might be helpful to you.

saulocpp · April 11, 2019, 10:55am

Just to answer my own question, the last paragraph of Pascal Tuning Guide at 1.4.5.2 states:
“To simplify this, Pascal follows Maxwell in returning to fixed four-byte banks. This allows, all applications using shared memory to benefit from the higher bandwidth, without specifying any particular preference via the API.”

Topic		Replies	Views
A question on single and double precision performance calculation with CUDA cores CUDA Programming and Performance	7	1928	May 31, 2024
Accuracy in GPU floating point calculations CUDA Programming and Performance	35	8231	September 9, 2011
Emulated double precision Double single routine header CUDA Programming and Performance	24	49175	October 18, 2010
quads in cuda? CUDA Programming and Performance	7	9906	August 3, 2011
Floating Point Precision of GPU CUDA Programming and Performance	6	2215	September 9, 2010
Double double precision arithmetic library now available CUDA Programming and Performance	14	8455	July 2, 2013
INACCURACY OF FLOAT DATA TYPE FLOAT DATA TYPE BECOME INACCURATE NEAR ABOUT 2^15 CUDA Programming and Performance	12	2454	July 10, 2010
Is there a difference between GPU double precision and CPU double precision? CUDA Programming and Performance	14	10771	November 26, 2009
Expected performance of double precision arithmetic CUDA Programming and Performance	8	4004	August 20, 2009
Accuracy problem I'd even say inaccuracy ... CUDA Programming and Performance	6	2783	June 28, 2008

how to calculate the double precision data by rtx(or gtx) GPU?

Related topics