NVCC loop bug since cuda 5.5

psychocoder · June 11, 2014, 11:48am

Hi,

I found a small example were nvcc create wrong results while unrolling loops.

System:

Ubuntu 12.04.4 LTS
NVIDIA-SMI 331.62 Driver Version: 331.62
cuda 6.0
Card: Tesla k20m

example code:

#include <iostream>
#include <cuda.h>
#include <stdio.h>

template<int N>
struct AnyClass
{

    static const int begin = -1; 
    static const int end = begin + N;

    template<typename T >
    __device__ void operator()(T* ptr)
    {
        /* increment loop begin and end by one*/
        const int offset_k = static_cast<int>(ptr[0]!=0);
        for (int k = begin+offset_k ; k < end +offset_k; ++k) {
            if ( k < begin+offset_k)
                printf("error  k=%i; begin=%i offset_k=%i => but k should be >= %i\n", k,begin,offset_k,begin+offset_k);
            else
                printf("OK  k=%i; begin=%i offset_k=%i \n", k,begin,offset_k);
        }
    }
};

template <int N>
__global__ void run(float* ptr,int v){
   ptr[0]=v;
   AnyClass<N>()(ptr);
}

int main(int argc, char *argv[]){

  float* d = 0;
  unsigned int n = 1;
  cudaMalloc((void**)&d, n*sizeof(float));
  run<2><<<1, 1>>>(d,1);
  cudaThreadSynchronize();
  return 0;
}

The code should run through two cycles, but it somehow unrolls it to four cycles.

Output:

error  k=-2; begin=-1 offset_k=1 => but k should be >= 0
error  k=-1; begin=-1 offset_k=1 => but k should be >= 0
OK  k=0; begin=-1 offset_k=1 
OK  k=1; begin=-1 offset_k=1

It enters two loop cycles.
In this example offset_k is 1 (it is not equal to zero)
begin = -1
k = begin + offset_k = -1 + 1
k should never be less than 0 but the for loop goes from [-2;1] ← that is wrong

Note: NVCC generates right results in cuda 5.0

Thank for any help, I applied for a CUDA developer account to open a bug report.

psychocoder · June 11, 2014, 3:04pm

if I change line 17 to

const int offset_k = -(-static_cast<int>(ptr[0]!=0));

all works fine.

njuffa · June 11, 2014, 3:41pm

I am not in front of a machine with CUDA right now, but from the output shown it looks like that there is a bug. Thanks for filing a bug report, that is very helpful. As a workaround for now, try adding

#pragma unroll 1

in the line just before the loop, that should turn off the multiple “unrollers” used inside the compiler. If that doesn’t help, the compiler team may have an alternate recommendation for a workaround in response to the bug report. Sorry for the inconvenience.

psychocoder · June 11, 2014, 3:52pm

Thanks for the fast reply.

#pragma unroll 1

does not help :-(

For this small code I have found a work around, but this code is a part of a 40k lines project and not all workarounds which helps in this small example helps in the real project.
I will try if my double negate trick helps use in the main project.

njuffa · June 11, 2014, 4:17pm

The fact that the pragma does not help would suggest that the issue is not with loop unrolling, i.e. no loop unrolling takes place. I was already surprised that the compiler would unroll a loop whose body is not straight line code and contains a function with side effects. The problem could easily originate in various other transformations applied by the compiler. The compiler team may be able to suggest a workaround once a root cause has been established.

psychocoder · June 12, 2014, 8:13am

I have tested the double negation workaround
const int offset_k = -(-static_cast<int>(ptr[0]!=0));
and it works for the small example and our real code.
It looks like the bug is triggered by the comparison (runtime_value!=0) and the cast from the result bool to int.

Tanks njuffa for the fast response!

A bug report is opened now.