restrict seems to be ignored for base pointers in structs. having base pointers with restrict as kernel arguments directly works as expected

klaus.leppkes · September 9, 2020, 7:57pm

So I tried my best to apply restrict everywhere for each base pointers of my (input and) output structs, still: nvcc generates 2 loads and 2 stores (example here: Compiler Explorer).
If I simply change my kernel to not having structs of arrays (SoAs), but instead using restrict decorated base pointers in the kernel arguments, everything is working as expected (1 ld, 1 store) (example here: Compiler Explorer).

So what do I need to change to inherit restrict to the base pointers of my structs?:)

Cheers,
Klaus

PS: Sorry for the cross post (__restrict__ seems to be ignored for base pointers in structs. having base pointers with restrict as kernel arguments directly works as expected)

njuffa · September 9, 2020, 11:12pm

I am not a language lawyer, and restricted pointers are not part of standard C++. They are, however, part of the ISO-C standard. My assumption is that the language extension __restrict__ in CUDA is modeled on ISO-C restrict specifications. It would probably be a good idea to search the CUDA documentation with a fine-tooth comb to check what, if anything, it has to say on the subject.

By my understanding of the semantics defined in the C standard, what you are observing is expected behavior. I would welcome a clarifying in-depth assessment from a C or compiler specialist, regardless of whether it confirms or refutes what I stated above. If you think you are able to make such a determination yourself based on a close reading of the ISO-C standard, you might want to consider filing an enhancement request with NVIDIA.

klaus.leppkes · September 12, 2020, 6:14pm

So I used locally defined restrict pointers. Interestingly, having them defined inside this global scope results in the “correct” optimization. Having them defined and used locally inside a scope breaks the optimization, for no obvious reason (at least for me). Example here: ( I cannot post links directly, so it’s “godbolt .org /z/WdjxWd”) (def/undef SCOPE for the two combinations). Also global scope defined restrict pointers can be used inside local scopes and the optimizer is working as expected.

njuffa · September 12, 2020, 6:20pm

I cannot post links directly

Why is that? It works for me. Either mark the link and click on the “chain link” button at the top of the input window (fourth from the left), or use Stackoverflow style markup: [description](url).

As I recall, the semantics of type modifiers/qualifiers (const, volatile, restrict) applied to struct members are a bit non-intuitive. I learned that the hard way and have since avoided using them in this capacity.

Since __restrict__ is a vendor extension to standard C++, you could always file an enhancement request with NVIDIA to have the semantics extended the way you think they should work.

klaus.leppkes · September 12, 2020, 6:28pm

My post was also (automatically) marked as spam … I am getting “Sorry you cannot post a link to that host.” again… :(

njuffa · September 12, 2020, 6:33pm

I see. That would seem to be a security feature of these new NVIDIA forums in order to protect against spammers who create throw-away accounts, similar to the ones used by Stackoverflow which restricts the embedding of links and images for new users. After you use the site for a while, such restrictions should presumably be lifted, although the rules controlling this are probably not publicly documented.

Test: Godbolt link.

klaus.leppkes · September 12, 2020, 6:34pm

So clang trunk (from godbolt) is NOT doing the “correct” optimization in both cases (global and local scope pointers), where as nvcc at least does the right thing having global scope restrict pointers. Yes, it’s not well defined etc., but just putting a scope around a basic block and getting a totally different result is surprising to me…:). I just would like to know where to address this issue/inconsistency.

njuffa · September 12, 2020, 6:39pm

As I said, restrict is well defined in ISO-C, and ISO-C only. Since it is not part of C++, the semantics of any existing vendor extension for restricted pointers in C++ compilers are likely similar but not necessarily identical to the standard C specification of restrict, and probably also subtly different from each other. I use what I consider the safe route (the smallest common subset), which in my experience means use it only for non-aggregate function arguments, i.e. function arguments that are simple pointers.

If you want, you can file an enhancement request with NVIDIA to clarify the __restrict__ semantics in the CUDA documentation.

klaus.leppkes · September 12, 2020, 7:06pm

I filed it here.

njuffa · September 12, 2020, 7:18pm

Sorry, that is not how it works. This forum is primarily a platform NVIDIA set up so users can help users. It is not really designed as a venue for sending bug reports and enhancement requests to NVIDIA. Every now and then someone from NVIDIA might poke their head in here, and they might take up an issue and file a bug report or enhancement request internally.

But: (1) one cannot count on that happening (2) NVIDIA’s bug database treats all issues as confidential so only the filer and relevant NVIDIA personnel can see it.

You would want to file your own request so you can track its status. Use the regular bug reporting form (there should be a sticky/pinned note at the top of this forum about filing bugs) and prefix the synopsis with “RFE:” which means “request for enhancement”, so it can easily be distinguished from reports of functional bugs. I have not reported an issue in a while, maybe the bug reporting form now also provides a mechanism that directly lets users mark an issue as an RFE.

klaus.leppkes · September 12, 2020, 7:59pm

Great, I wanted so submit this as a bug report, but I only get “An error occurred while processing your request.”. Very specific…:/

njuffa · September 12, 2020, 8:14pm

That likely means your report triggered a malfeasance filter in the reporting form. For years now NVIDIA appears to have set the trigger level on that to insanely strict settings. So it doesn’t give any information away, one gets a generic error message when the filter is triggered. Frustrating, I know.

A common strategy to deal with that is to file a very rudimentary report first. Like a synopsis and a two-sentence description, with a comment that you’ll add more info later. In particular, avoid adding links or anything that looks like HTML to the initial report. Then come back to add additional information bit by bit. Yes, it is a pain in the neck.

Sometimes there are also technical issues of an unspecified nature affecting the bug reporting form.

FWIW, I noticed that your first post in this thread currently shows up for me as “flagged by community”. I doubt that anybody from the community has flagged it, as I see nothing objectionable in it. I suspect that it was auto-flagged by the AI (e.g. “a link in an initial post means suspicious”). I hope that this has not put your account as a whole on the bad side of NVIDIA’s machinery.

Robert_Crovella · October 17, 2020, 5:41am

Passing __restrict__ within a struct used as a kernel argument will not have the desired effect. This is a characteristic of nvcc

There are no plans to change this behavior at this time. The recommendation is to move that restrict usage to ordinary kernel pointer arguments.

You’re welcome to file a bug. There are several in our system for this topic already.

Topic		Replies	Views
__restrict__ seems to be ignored for base pointers in structs. having base pointers with restrict as kernel arguments directly works as expected nvc, nvc++ and nvfortran cuda	1	559	March 10, 2021
Does the use of 16-bit, __restrict__ const kernel arguments hurt performance? CUDA Programming and Performance	4	4260	May 24, 2018
Clarification of __restrict__ in cuda CUDA Programming and Performance compile	1	982	September 12, 2020
CUDA Pro Tip: Optimize for Pointer Aliasing Technical Blog	13	893	April 11, 2019
__restrict__ - where must I have it? CUDA Programming and Performance	1	1416	April 28, 2016
__device__ function clarifications CUDA Programming and Performance	6	21524	December 10, 2008
Do const __restrict__ pointers ever generate LDG.CI loads on CUDA 7? CUDA Programming and Performance	9	3756	March 5, 2015
Difference between raw pointer and reference CUDA Programming and Performance	5	1678	September 25, 2023
Restrict usage full overlapping element-by-element processing CUDA Programming and Performance	16	997	October 12, 2021
NVCC Segfault on boost::format in Host side code in .cu file CUDA Programming and Performance	8	2261	February 8, 2011

__restrict__ seems to be ignored for base pointers in structs. having base pointers with restrict as kernel arguments directly works as expected

Related topics

restrict seems to be ignored for base pointers in structs. having base pointers with restrict as kernel arguments directly works as expected