c code compiled as a kernel returns 1.#QNB

I am working on implementation of couple of signal processing algorithms.

One of the blocks is a matrix inversion but haven’t decided on which method to implement in GPU. Hence I am just using what I had in C for now.
While the code works fine for small matrices (5x5) on GPU, for large matrices(300x300) one of the sub-blocks (Cholesky decomposition) returns 1.#QNB from some point on(say from row 160 the output matrix entries are 1.#QNB).

The block involves three inner loops (eventually it will have to be changed to enhance the performance) where a summation is computed(code follows).

  • Initially I thought maybe the summation exceeds the precision limit but it doesn’t look like it.
  • I cannot define “double” variables in my code.
  • For some reason I cannot change the definition of i,j and k from int to uint while I am using uint in other kernels.
  • I have run the kernel using both clEnqueueTask) and clEnqueueNDRangeKernel and they both have the same problem.
  • the output should be an upper triangle matrix and as a result in the following code j=i:size_m and the 1.#QNB’s are appearing on the entries above the diagonal. (the entries below the diagonal are not assigned and they are not used later neither.)

Here is how the kernel looks like:
__ void Cholesky(__global float a, uint size_m, __global float * b ){
int i,j,k;
float sum=0.0, var=0.0;
for (i=0;i<size_m;i++){
for (j=i;j<size_m;j++){
for (k=i-1;k>=0;k–)
if(i == j){
else b[j

any idea is greatly appreciated.

I resolved the problem! It was made by the single precision of the data (I don’t know if other people can define “double” variable but my version of compiler looks doesn’t recognize it.)

The Cholesky decomposition is made of a series of multiplications, sums and divisions which is vulnerable to over/under flow.

By some data conditioning they can be avoided.