Clarification of restrict in cuda

klaus.leppkes · September 12, 2020, 6:59pm

So from a few tests with nvcc (also clang), I discovered a (at least for me) strange behavior (discussion can be found here: __restrict__ seems to be ignored for base pointers in structs. having base pointers with restrict as kernel arguments directly works as expected).
I simplified the example as such that the same code (use locally restrict annotated pointers inside the function or inside a local scope) results in totally different code. Having no scope, nvcc’s optmizer is able to spot the same loads (and stores), resulting in 1 load and 1 store, which is expected (at least from my side). Still, having the exact same code inside a local scope breaks the optimization, resulting in 2 loads in 2 stores. clang always fails to optimize this in both cases.

klaus.leppkes · September 12, 2020, 7:02pm

So here is the cooked down example:

#define RESTRICT __restrict__
//#define RESTRICT

#define SCOPE

// Type your code here, or load an example.
__global__ void square(const float * g_input, float * g_output, int n) {
    int tid = blockIdx.x;

    //if (tid < n)
#ifdef SCOPE
    {
        const float * RESTRICT a_in = g_input;
        float * RESTRICT a_out = g_output;
#else
    const float * RESTRICT a_in = g_input;
    float * RESTRICT a_out = g_output;
#endif
        

        float la = a_in[tid];
        a_out[tid] = la;
        la = a_in[tid];
        a_out[tid] = la;

#ifdef SCOPE
    }
#endif
        
}

If SCOPE is defined, it results in 2 loads in 2 stores. Otherwise, it’s optimized with 1 load and 1 store only.

Topic		Replies	Views
__restrict__ seems to be ignored for base pointers in structs. having base pointers with restrict as kernel arguments directly works as expected CUDA Programming and Performance	12	2233	October 17, 2020
__restrict__ seems to be ignored for base pointers in structs. having base pointers with restrict as kernel arguments directly works as expected nvc, nvc++ and nvfortran cuda	1	604	March 10, 2021
Using __restricct__ in CUDA is not giving any significant performance benifit CUDA Programming and Performance	2	547	February 9, 2023
Does CUDA harness the restrict functionality? CUDA Programming and Performance	9	2367	September 2, 2016
Unexpected behavior with __restrict__ keyword? CUDA Programming and Performance	0	488	August 6, 2019
CUDA Pro Tip: Optimize for Pointer Aliasing Technical Blog	13	1144	April 11, 2019
Restrict usage full overlapping element-by-element processing CUDA Programming and Performance	16	1322	October 12, 2021
__restrict__ - where must I have it? CUDA Programming and Performance	1	1488	April 28, 2016
Does the use of 16-bit, __restrict__ const kernel arguments hurt performance? CUDA Programming and Performance	4	4520	May 24, 2018
can you use `restrict` on data members with cudaMalloc? CUDA Programming and Performance	2	847	January 23, 2018

Clarification of __restrict__ in cuda

Related topics

Clarification of restrict in cuda