Hi people, exist in CUDA (or in the library CUBLAS) a function or formula for the calculation of binomial coefficient? I need calculate the binomial coefficient of N on 2. I have only the version for CPU:

int coefficienteBinomiale(int n, int k){
if (k==0 || k==n)
return 1;
return coefficienteBinomiale(n-1,k-1)+coefficienteBinomiale(n-1,k);
}

Hope the code below would be helpful in Your case:

__device__ int fastCoeffBinomialN2(int N, int *cache) // initialize the 'cache' to arbitraty length with zeros
{
//cache[0] = 0; // this can be done once, outside the function
//cache[1] = 0;
//cache[2] = 1; // this essential - the method will not work should this is not set
if (cache[N] > 0) return cache[N]; // this appears when for the N the function has been called once before
return (N > 2) ? (cache[N] = (N-1) + cache[N-1]) : cache[N];
}

The function is speciefied for calculating the coefficients of N on 2 only. It uses caching to store inportant data and execute faster.

The function is a device code, thus it can be called in whatever number of thread You setup. The formula is simple enough, that one thread can be used for single coefficient calculation.

The ‘cache’ is an array of int’s initialized with zeros. It can be separate for each thread or global/shared (note the race condition possibilities in such cases). It stores coefficients from the Pascal’s triangle for the second binomial parameter set to 2.

The ‘N’ refers to the N from the binomial (N on 2).

Also an essential thing is to initialize cache[2] with value 1 (I forgot to finish the comment ). The initialization can be a first action done within the kernel or, in case of cache being a global array, a host code - set the host array ‘c’ to zeros and ‘c[2]’ to 1, then pass it as kernel parameter:

and do Your calculations using function ‘fastCoeffBinomialN2’.

Regrads,

MK

P.S.

Of course I forgot to add that the code of the ‘fastCoeffBinomialN2’ refers to SINGLE ITERATION of the method. Sorry for my misleading You. The correct code of the function is likewise:

__device__ int fastNewtonBinomialN2(int N, int cache[])
{
int i = 2;
while (i <= N) {
if (cache[N] > 0) return cache[N];
cache[i] = (i-1) + cache[i-1];
++i;
}
return cache[N];
}

First of all my method does not use recursion. Second, it calculates the coefficients only for N on 2, not the whole Pascal’s triangle. Furthermore, once You have called the function for given N, when calling for it again the result is given immediately from the cache. Thus it should be faster, but I have not tried it out.

Another thing comes to my mind that You could fill the cache with the coeffs values up to a arbitrary max N. Then only thing You need to do within kernel is to read the cache for given N. The values of the binomial for the case You are intrested in are known and thus can be stored as kernel parameter not calculated online. The new ‘fastNewtonBinomialN2’ would look likewise:

__device__ int fastNewtonBinomialN2(int N, int cache[])
{
// given cache is an array as follows: int cache[] = { 1, 3, 6, 10, 15, ... and so on
return (N >= 2) ? cache[N-2] : 0; // correct me if I'm wrong, please...
}