arbitrary precision arithmetic

Is there any cuda library for doing arbitrary-precision arithmetic, like for example
multiplication of 2 million decimal-digit numbers?

Thanks much,

You might want to take look at CAMPARY:
Mioara Joldes, Jean-Michel Muller, Valentina Popescu, Warwick Tucker: “CAMPARY: Cuda Multiple Precision Arithmetic Library and Applications”, 5th International Congress on Mathematical Software (ICMS), July 2016, Berlin, Germany

The best link to the software itself that I could find in a five-second search is, but by all means check the usual open software repositories as well.

An older project is CUMP:
T. Nakayama and D. Takahashi: “Implementation of Multiple-Precision Floating-Point Arithmetic Library for GPU Computing”, Proc. 23rd IASTED International Conference on Parallel and Distributed Computing and Systems (PDCS 2011), pp. 343–349 (2011).

Even older is gpuprec:
Mian Lu, Bingsheng He, and Qiong Luo: “Supporting extended precision on graphics processors”. DaMoN '10 Proceedings of the Sixth International Workshop on Data Management on New Hardware, June 2010, pp. 19-26.

I have not used any of the above.

Thanks a lot for your time.


directly from nVidia:

2 million decimals is a lot. I am not sure if there is a practical general purpose library that can do it.

Interesting. I wasn’t aware of an effort by NVlabs to produce such a library. Is there a published paper available somewhere? Or at least a GTC presentation slide deck?


1 Like

this library is only good (optimized) up to the Maxwell microarchitecture

For best performance on Pascal, force it to multiply using XMAD, for Volta and Turing, force IMAD.

For peak performance on Pascal, stick with CUDA 8.0 - Volta and Turing can use the later CUDA releases.

XMP paper:


Do you know a place where I can find documentation about CAMPARY?
I was not able to find a guide or neat examples.

The problem that I want to solve requires arrays of numbers with precision higher than doubles (multi-precision). I need to continuously work with them in the CPU, transfer them to the GPU, transfer the results from the GPU to the CPU,… (always preserving the precision)

The operations that I use are just +, -, *, /, and they are implemented in CAMPARY, but I do not know how to implement the declarations of arrays in the CPU-GPU and transfer these arrays (preserving the precision) from the CPU (GPU) to the GPU (CPU). If you can provide a basic example it would be great!!


I was under impression that CAMPARY is open-source software. If so, “Use the source, Luke!”

Thanks for the suggestion.
The problem is that the source:

does not provide information, just the “.h” files.

My needs could be fixed with a basic example (CPU-GPU transfer of multi-precision arrays preserving the precision), instead of diving in the codes.

Yes, “use the source” means diving into the code. Not every open-source project comes with docs and/or neat examples. CAMPARY is a header-file library so the entire source is in those .h files.

A quick look at the header files indicates that the multi-precision types are simply arrays of doubles (e.g. quadruple precision: four doubles), so multi-precision operands can be copied trivially between CPU and GPU.

This may be of interest.

CGBN seems to be a good solution: GitHub - NVlabs/CGBN: CGBN: CUDA Accelerated Multiple Precision Arithmetic (Big Num) using Cooperative Groups
However it has not been updated for 2 years and does not support Turing architecture and later. Do maintainers intend to add support for latest architectures?

I used the library for my project in a Ampere architecture and I did’n have any problems. Just change the -arch flag in you nvcc command. I used -arch=sm_80 for Ampere’s architecture. Try to update the Makefiles with this and test in you device.

For example, in the makefile for samples/sample_01_add I used this:

nvcc $(INC) $(LIB) -I../../include -arch=sm_80 -o add -lgmp