CUDA 9 RC Cooperative Groups Compile Error

I am trying the new CUDA 9 release candidate and am getting some new errors that I did not encounter with previous versions. I am trying to do a grid synchronization using cooperative groups on a Pascal TITAN Xp GPU.

I am implementing the grid sync like so

cooperative_groups::grid_group grid = cooperative_groups::this_grid();
cooperative_groups::sync(grid);

Compiling gives me the error

ptxas fatal   : Unresolved extern function 'cudaCGGetIntrinsicHandle'

I tried adding the -dc flag to my compile line which instead gives me the error

lib/myLib.so: undefined reference to `__cudaRegisterLinkedBinary_51_tmpxft_00006f11_00000000_6_myKernel_cpp1_ii_6f9e48ae'

If I comment out the sync lines, and don’t use the -dc flag everything compiles fine. Has anyone seen this issue or know how to go about fixing it?

You must compile with relocatable device code and linking. This is mentioned in the CUDA 9 RC programming guide.

So your compile without this at all will result in the ptxas unresolved external error.

When you compile with -dc you are picking up the necessary compile step but not the necessary device code linking, which is why you get the undefined reference.

I suggest for starters doing the entire process with nvcc and specifying -rdc=true

Once you get that working, if you want to create libs or otherwise split your compile and link steps, then you need to learn how to do proper rdc compile and linking, and there are only about 10^6 questions on the internet that cover that (not to mention the nvcc manual).

I’m getting the same error using the Nsight Eclipse and a new project.

Is there a setting inside of Nsight that I can set to handle the linking ?

For anyone that needs it … I found the answer regarding Nsight on Stack Overflow:

https://stackoverflow.com/questions/38260577/generating-relocatable-device-code-using-nvidia-nsight

The defined way to enable this capability for an Nsight EE project is to do so at project creation time. After selecting File…New…CUDA C/C++ Project, you will be presented with the project creation wizard/dialog. Enter a project name and click “Next”. You will then be taken to the “Basic settings” dialog page. Here you will see an option “Device linker mode:” and the choices will be “Whole program compilation” (default) or “Separate compilation”. If you select “Separate compilation”, then your project will be set up for relocatable device code generation. – Robert Crovella Jul 8 '16 at 10:42
2

After a project is created, you can also make this change by going to Project…Properties…Build…Settings. Here you will see a page similar to the one mentioned above in the “Basic settings” dialog page. You can similarly change “Device linker mode:” on this page from “Whole program compilation” to “Separate compilation” in order to turn on generation of relocatable device code, after the project has already been created. – Robert Crovella Jul 8 '16 at 10:47