Nvidia langauge and gpu or dev board to do 256, 512 and 1024 bit integers for maths calculations

can I , with any NVIDIA dev board (xavier) or with any NVIDIA GPU (RTX8000, etc…) and / or nvidia programming language, calculate and do math operations with > = 256 bit integers ?

So in order to do maths with numbers equal or greater than (unsigned) : 115,​792,​089,​237,​316,​195,​423,​570,​985,​008,​687,​907,​853,​269,​984,​665,​640,​564,​039,​457,​584

There‘s arbitrary precision big num libraries such as XMP and CGBN specifically for CUDA. Both have rather clunky host side APIs - but the acceleration is good!

Also the libraries may require some modifications to work with the latest hardware, as these are no longer updated it seems.

Arbitrary is not ok, must be precise. Are these enable calculations say to multiply one 256 bit number say per another 256 bit number ? and to have the exactly result

so say


multiplied by


Another question, which is the maximum [number] can I use to say simply calculate the ([number]x3)+1 function ?

can I use these libraries to get a precised result or does not count all numbers from 0 to the maximum and or does not calculate precisely the results on 128 or higher bit integers ?

Im looking for 256 to 512 bit integers

For the love of god, please google the meaning of arbitrary precision arithmetic.

I did it, It is not the same using arrays to store values for multiple 64 bit integers, thats not what Im looking for.

Can you reply to my question, how to calculate and multiply many 128 or 256 bit integers or calculate for

Given that GPU registers comprise 32 bits, any integer computation on wider integers will require those integers to be represented as arrays of 32-bit (uint32_t) chunks. One speaks of arbitrary-precision integer arithmetic when the number of chunks is freely selectable by the programmer.

Computations on large integers, whether on GPUs or CPUs, are always broken into such “limbs”, usually either of 32 bits or 64 bits depending on the register width of the hardware. A well-known and much-used library of this sort for CPUs is GMP. Integer computations are by their nature exact, unless the result becomes so large it cannot be represented in the number of bits allocated (that is, there is integer overflow).

I have not used the libraries mentioned by @cbuchner1, but I would be highly surprised if either cannot handle computation at widths of 256, 512, and 1024 bits, because integers of that size are simple enough to handle that you could implement basic arithmetic yourself with moderate effort.

1 Like

thanks,yap, that sounds good, yesterday I was looking at GMP, sounds good.

Any idea if having an nvidia GPU will take profit on speed of maths / calculations over those => 256 bit integers, using GMP, or GMP relies on CPU ?

And Nvidia GPU vs other NVIDIA GPU’s, is significantly faster a 8 series vs say a 7 or 6 series ?

and GPU vs Xavier dev kit ?

(cause Im considering purchase some NVIDIA hardware if really take less time to compute / calculate / math operations like sum, multiply, divide, over large bit integers (>= 256 bits integer numbers)

While I have programmed some wide integer multiplies and adds with CUDA in the past, I have never had a need to use an arbitrary-precision integer library. I am under the impression that long-time forum participant @cbuchner1 has practical experience with that subject matter.

In order to exploit the copious computational resources of the GPU, you will need massive parallelism (think on the order of 10K threads). Whether your use case provides that, I do not know. You may want to do a literature search and do some prototyping / benchmarking using GPU-accelerated instances provided by cloud providers such as AWS before spending significant amounts of money for your own hardware. I am aware that people doing prime number searches have found good ways of exploiting GPUs, also people doing cryptographic computations.

what about Intel AVX-512 ?
as for speed of course, in order to consider nvidia gpu or cpu vs intel avx-512 capable cpu

Sorry, can’t help you there. I have a machine with AVX-512 support here but have not looked into targeting it; AVX2 is far as I have gotten in terms of practical SIMD programming experience. In general I find programming using SIMD intrinsics cumbersome and CUDA’s programming model much more straightforward to use. Again, the literature may already have some performance data for your use case. Google Scholar is a useful tool for searching what is out there in terms of publications.