So there is lots of Nvidia sample code from over the years that uses __lanemask_lt()
but that API is “undefined” as of 11.7. I’ve tried compute_80 and sm_86 but neither make the API available. Has that API been removed in favor of raw PTX?
I can’t find it either. Not in CUDA 9.2, not in CUDA 11.4.
It is mentioned both in the CUDA 11.7 programming guide as well as this blog.
If this is a concern to you, I suggest filing a bug.
There is a similar function in the namespace cooperative_groups::details
, but you are not supposed to depend on anything in such a “details” namespace, so that is as far as I’ll go.
If it were me, and I wanted to construct my own, I would follow that example.
Apologies Robert, but I have no useful C++ knowledge, having progressed to a pretty average C competency.
Is there a file you can refer me to for this “example”?
Thanks.
generally speaking, in CUDA, when you want to use cooperative groups functionality, you should include the header file cooperative_groups.h
. So that header file is going to be in the “usual place” for CUDA header files. On a typical linux install that would be /usr/local/cuda/include
If you study that include file, or just look around in that directory, you will notice a subdirectory called cooperative_groups
(at least it is there on my CUDA 11.4 install). Inside that is a directory called details
. In that directory you will find helpers.h
. (this appears to be a “helper function”, implemented in the details
namespace, for cooperative groups). If you:
grep lanemask /usr/local/cuda/include/cooperative_groups/details/helpers.h
You’ll find an example function that could be used as a model to create your own lanemask_lt
.
Thanks for that.
Follow up: I had not realised “lanemask_lt” was a special register. As such, it’s described in the PTX ISA:
Yep. I interpreted the request here to be a function callable from CUDA C++.
If you’re doing PTX you can use it directly. But it isn’t formally exposed in CUDA C++ via any builtins or intrinsics that I am aware of.