Use #pragma unroll with macro

I’m trying to use pragma unroll with a macro

#define UNROLL_FACTOR 8

and later using

#pragma unroll UNROLL_FACTOR

But somehow that doesn’t compile

oz@Ozz:~/repos/GPU_Programming$ nvcc -o matmul matmul.cu
matmul.cu(22): error: identifier "UNROLL_FACTOR" is undefined
  #pragma unroll UNROLL_FACTOR
                 ^

matmul.cu(22): warning #20169-D: the optional argument to the unroll pragma must be an integral constant expression, ignoring pragma for this loop
  #pragma unroll UNROLL_FACTOR
          ^

Is there any way to use unroll with a macro?
For an x/y problem I’m trying to specify UNROLL_FACTOR as a compile time parameter so that I can easily fine tune my kernel

I think the problem you are running into is that the details of pragma handling are not defined by the C++ standard, and therefore a portable solution may not be possible, however I do not know that for sure.

The preprocessor comprises multiple stages, and this is where macro magic happens, but it is not clear which stage handles pragma. For the record, the following compiles fine for me with CUDA 12.3 on Windows using MSVC 2019 as the host compiler, and the disassembly shows the desired amount of unrolling (I tried factors of 1, 2, 4, and 8):

#define UNROLL_FACTOR 4

__global__ void kernel (const float * __restrict__ x, 
                        const float * __restrict__ y, 
                        float * __restrict__ z, 
                        int len)
{
    int stride = gridDim.x * blockDim.x;
    int tid = blockDim.x * blockIdx.x + threadIdx.x;
#pragma unroll UNROLL_FACTOR
    for (int i = tid; i < len; i += stride) {
        z[i] = x[i] + y[i];
    }
}

int main (void)
{
    kernel<<<1,1>>>(0,0,0,0);
    return 0;
}

There is a predefined C++ macro _Pragma() that takes a string argument and that is an operator that can be incorporated into macros. This can be used instead of #pragma. The following works for me in the compilation context specified above:

_Pragma ("unroll "  UNROLL_FACTOR)

If this does not work in your compilation environment, depending on which pre-processor phase handles pragma you may want to experiment with stringifying UNROLL_FACTOR, etc.

[Later:]

Experimenting a bit, it seems that the pre-processor of the host compiler on Linux (i.e., gcc) processes #pragma very early, ahead of any macro expansions, and this seems to apply to _Pragma() as well. Bummer.

In theory, what could also work is to include files, which just contain a #pragma directive. At least in Windows host code (have not tried in device code) a stringified include filename optionally using macros as pathname or part of the pathname is accepted.

Not sure, if it would be a problem, for the for loop to be separated from the pragma.

Edit: It seems - unintuitively - instead of only preprocessor macros, one can use anything, but preprocesser macros: const, constexpr and template arguments.

constexpr auto UNROLL_FACTOR{8};

seems to work for me.

Yes, I acknowledge that is not a compiler macro. I was just pointing it out in case it might be an acceptable alternative.

This also seems to work, if you require a macro:

#define UF 8
constexpr auto UNROLL_FACTOR{UF};
2 Likes

Color me surprised. I am curious: What suggested to you that this should work?

While I am not sure what in the C++ spec (or the CUDA compiler, given that #pragma is implementation specific) makes this work, I can confirm that it works fine with CUDA 12.3 on Windows as well. I used the code below, using the following nvcc invocation with various values of FOO:

nvcc -DFOO=4 -o test_unroll.exe test_unroll.cu
constexpr auto UNROLL_FACTOR{FOO};

__global__ void kernel (const float * __restrict__ x, 
                        const float * __restrict__ y, 
                        float * __restrict__ z, 
                        int len)
{
    int stride = gridDim.x * blockDim.x;
    int tid = blockDim.x * blockIdx.x + threadIdx.x;
#pragma unroll UNROLL_FACTOR
    for (int i = tid; i < len; i += stride) {
        z[i] = x[i] + y[i];
    }
}

int main (void)
{
    kernel<<<1,1>>>(0,0,0,0);
    return 0;
}

Turns out one does not even need constexpr. The following, which requires nothing more than C++98 features, also works for me on both Linux and Windows:

const int UNROLL_FACTOR=FOO;
#pragma unroll UNROLL_FACTOR

Oh, as I understood C++ macros they just got replaced with the value at compile time, that’s why I thought this will work. Using constexpr that is initialized with a macro fixes that for me

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.