not sure where to ask this but CUDA can only handle 4 byte and 8 byte floating point?

dc_vector · July 6, 2018, 3:27pm

What I mean to say is that there are some functions for 32-bit floating point and some for 64-bit floating point and nothing for long double types at all it seems.

Looking at [url]https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#mathematical-functions-appendix[/url]

So this means edge case calculations like cosine(1/(2^26)) will likely return 1.0 and not a correct value.

I have to test this with a few trivial test cases but it seems there is no CUDA support for 128-bit long doubles?

Robert_Crovella · July 6, 2018, 3:34pm

correct

njuffa · July 6, 2018, 3:47pm

The nature of floating-point arithmetic means it provides a finite number of discrete points (a subset of ℚ), rather than the continuous real space ℝ provided by mathematics.

As a consequence, for any finite-size floating-point format, there will be an epsilon such that |x| < epsilon implies cos(x) = 1, and that result is correct within the precision constraints imposed by the given format.

I don’t know what your use case entails, but it is likely possible to sidestep whatever issue you are facing by computing via sine, if computing via cosine doesn’t work the way you want due to finite-precision floating-point. An example would be an implementation of small-angle rotation in a 2D-space.

For potentially similar scenarios involving functions like exp() and log(), the functions expm1() and log1p() were invented some decades ago.

dc_vector · July 6, 2018, 4:55pm

Actually I was thinking of a way to implement an interest rate approximation but
the tolerance will need to be around 10E-12 or so and I am not sure there are
enough digits of precision. The Excel folks most likely use Runge Kutta or maybe
just bisection to get to a solution :

[url]https://support.office.com/en-us/article/rate-function-9f665657-4a7e-4bb7-a030-83fc59e748ce[/url]

In any case it looks like 53-bits is what I have to work with.

njuffa · July 6, 2018, 5:05pm

All computations related to interest rates that I am familiar with involve expm1() and / or log1p(), none involve trigonometric functions like cos(). E.g.

compound (r, n) = exp (n * log1p (r))
annuity (r, n) = -expm1 (-n * log1p (r)) / r

Some platforms even offer the compound() and annuity() functions as part of their math library, although CUDA does not. As shown above, they are easily and accurately synthesized from standard C/C++ math library functions that CUDA provides.

dc_vector · July 6, 2018, 5:14pm

→ none involve trig

That wasn’t the point.

Also, solving for interest rate requires numerical methods.

njuffa · July 6, 2018, 5:26pm

I agree. As far as I know it does require a solver as there is no closed-form solution. But the function whose zero(s) one needs to find with said solver to determine the interest rate involves terms very similar to the compound() and annuity() functions shown above. Therefore I would think my comments were pertinent.

I am not convinced that any sort of extended precision is required to create a robust solver to compute the rate, as the original post seems to imply. Obviously, if you are targeting relative accuracy around 10**(-12), you will need to use double-precision computation.

dc_vector · July 6, 2018, 5:58pm

I am just kicking the tires. I have never worked with CUDA and I thought it was just for game software and data display. However the ability to ( maybe ) solve for a suite of annuity solutions in parallel would make anyone really curious. The issue of precision is for cases where there may be 52 payments in a year ( weekly ) or even 26 ( bi-weekly ) for terms longer than six years. In both cases the fractional periodic interest rate may require quite a few digits to compute. I didn’t research enough and made the error of thinking that the Quadro line was designed for scientific computation but it turns out to be more or less the same as the GeForce game cards. Really, I don’t know. I bought both to kick the tires and a good problem feels like interest rate approximation across some small set of inputs which, if my reading tells me anything, can be done in blocks of “kernel” threads which read from multi-dimensional arrays. Whatever the case I won’t be able to call mpfr or gmp libs for any of this.

n.b.: the Excel docs claim : RATE is calculated by iteration and can have zero or more solutions. If the successive results of RATE do not converge to within 0.0000001 after 20 iterations, RATE returns … an error.

[url]https://support.office.com/en-us/article/rate-function-9f665657-4a7e-4bb7-a030-83fc59e748ce[/url]

njuffa · July 6, 2018, 6:15pm

CUDA is a parallel programming environment based on a subset of C++(2011). As such it is suited for computations of any kind that can benefit from massive parallelism. The typical parallelism in a CUDA accelerated app is on the order of ten thousand threads, versus a dozen threads or so with CPUs.

My background is skewed heavily toward scientific computation, not financial computation. I don’t know what solvers are typically used in rate computation. You may be able to find open-source implementations that you can study. I am thinking a package like R might offer a rate computation? On the face of it, the function whose roots one has to find looks to compute the rate looks reasonably well behaved, so to first order any commonly-used solver would probably work. You could try simple bisection first, and then progress to a hybrid method like Brent-Dekker to see whether that works better.

I am not sure where massive parallelism would come in. Maybe in your use case you need to consider many thousands of scenarios, and compute the interest rate for each?

To get started with CUDA any modern moderately-priced consumer GPU will do, e.g. a GTX 1060. I use my equipment 24/7 and do use “professional” systems built with Xeons and Quadros, but a casual CUDA user does not need to go that (expensive) route.

dc_vector · July 6, 2018, 7:11pm

The speed of any decent modern system allows “simple bisection” to work neatly in maybe a millisec or so. However I have used binary128 floating point as well as gmp/mpfr libs to get the job done. This is for a product that should compute a range of solutions and then search the results for an optimal balance between customer costs and internal processing fees and over head to service a finance instrument. Not nearly as cool as lab work or FFT work or signal processing or nearly anything else anyone can be doing … however it pays the bills. No pun intended.

njuffa · July 6, 2018, 7:35pm

There are actually many financial companies, from small specialist shops to giant multinational banks, that use CUDA and GPUs for financial computations, and have been doing so for years (an early example: [url]https://www.hpcwire.com/2009/05/06/french_bank_takes_on_gpu_computing/[/url] ). There are probably various forums serving people in the field of quantitative finance; the only one I am aware of is the Wilmott forum, which offers a sub-forum for numerical methods.

In recent decades people have come up with a number of clever tricks to make do with native precision in just about all application areas, mostly by avoiding situations where subtractive cancellation occurs and reducing rounding errors. There are the standard expm1() and log1p() functions already mentioned, which were partially motivated by financial computations. There are also all kind of uses for fma(), the fused multiply-add operation with a single-rounding at end. This operation is supported directly by GPU hardware.

One area of financial computation that is often not well suited for GPUs is high-frequency trading. GPUs are designed as devices offering the highest possible throughput, with lackluster latency as a trade-off. If your time budget is measured in microseconds, just shipping off source data to the GPU may already take too long, and high-clocked CPUs with large caches are the more suitable path.

Topic		Replies	Views
Hardware for CUDA development CUDA Programming and Performance	14	1839	April 24, 2014
Why accuracy CPU and GPU not equal? CUDA Programming and Performance	6	10923	October 28, 2014
Is there a difference between GPU double precision and CPU double precision? CUDA Programming and Performance	14	10609	November 26, 2009
quads in cuda? CUDA Programming and Performance	7	9848	August 3, 2011
GPU Code and CPU Code output not matching till machine precision (i.e. 13 decimals places) CUDA Programming and Performance	22	752	August 9, 2023
Possible Rounding/Precision Errors in CUDA Math APIs? GPU-Accelerated Libraries math-api	5	81	July 31, 2024
Floating-point precision problems CUDA Programming and Performance	14	4382	January 7, 2011
error when trying to use half (fp16) CUDA Programming and Performance	16	19854	October 13, 2015
floating point precision on CUDA CUDA Programming and Performance	11	14706	June 8, 2010
CUDA SUCKS!!! Why <block, thread> cannot be judged by itself CUDA Programming and Performance	20	8081	February 17, 2015

not sure where to ask this but CUDA can only handle 4 byte and 8 byte floating point?

Related topics