Factorial for CUDA? existing implementation of factorial

I need to use a ‘factorial’ function in the calculation of zernike moments in an image processing application. Simple, right? Nope.

I’ve been searching through the docs and forum for references to ‘factorials’ in hopes that either CUDA supports it or someone has it implemented, but so far no luck. Am I missing something?

A float will run out of significant digits after 10!, and a 32-bit int will run out of digits after 12!, so you could probably do this pretty efficiently with a lookup table in constant or shared memory.

If you need larger factorials, you’ll start to lose precision, and want to rearrange the computation to avoid having to hold factorials that big.

That was a smart answer!!!


  1. You could implement biggg numbers in the form of strings. And, define a set

    of functions that implement arithmetics (including multiplication) on such

    strings and go ahead with your factorial implementation.

    (I guess it would be worthless to implement this on CUDA. But who knows,

    it could be on the other side too. Juss a matter of trying…)

  2. Get or manufacture a calculator (cellphone’s calculator???) that has an USB port

    to talk to computers and communicate your math with it. Calcis can deal with

    big numbers easily.

I think it would be best to use some kind of approximation, or otherwise a 1-D texture.

The gamma function is part of cuda’s math library with a maximum ulp error of 6. However, if you only need integer values, the best solution is a lookup.

That’s a very good point about truncation on precision. A lookup is what I’ll likely look into in future. For the time being I’ve split the tasks such that zernike polynomials (requiring factorials) are precalculated by the CPU over an image segment. Then the results are stored in the GPU where this needs only to be done once. The more intensive and repetative calculation (zernike moments) are done seperately utilizing the stored zernike polynomials. (no factorials required).

Unfortunately I’m still faced with precision issues as you pointed out (beyond 12!). Thanks!