Split X64 and cubin into separate files.

If this is a duplicate post, I apologize. I did not get any acknowledgement of my previous posting making it into the forum.
I previously have asked this question on the “Nsight, Visual Studio Edition” sub forum
https://devtalk.nvidia.com/default/topic/1046884/nsight-visual-studio-edition/generating-separate-cubin-files/
A moderator told me to try the “CUDA Programming and Performance” sub-forum, as they might know better.

In short, this is the question

I have a working CUDA project which builds successfully with Visual Studio 2015.
Now I would like to modify this project, so that as result two files will be generated:

  1. An object file (or maybe DLL) which runs exclusively on the X64 side of the PC.
    It contains all X64 sections, including calls to the CUDA Driver API, but excluding any cubin or PTX code.

  2. A cubin file which I can load with the CUDA Driver API (e.g. cuModuleLoad).
    It contains all cubin instruction, but it does neither contain any X64 instructions, nor any PTX code.

I tried lots of stuff, but with little understanding.
I know that the -cubin switch generates the cubin code I want, but it disables generation of the X64 instructions I also need; so that doesn’t help.

Can somebody help me please?

It’s certainly not the only way to do it, but study the vectorAdd_nvrtc sample code.

There is no PTX to be found anywhere. There is also no cubin.

Alternatively you can start with the vectorAddDrv sample code, and compile the ptx kernel file to a cubin.

You can then slightly modify the .cpp source file to load a cubin instead of the PTX.

Thanks.
That “vectorAdd_nvrtc” sample is really a geeks delight, and it is impressive to compile at runtime.

Sadly that solution feels difficult to me. I didn’t say that before, but I have hundreds of kernels; modifying all the kernels does not feel like an option.

The first part, extracting out the cubin part does work.
I found even a second solution to do that. For the benefit of other readers, this is it:
Accepting the need to do the compilations twice, I can use the commandline command
C:\Temp>nvprune -arch sm_61 my_module.cu.obj -o my_module.cubin
I like that part as it depends less on vcxproj magic which feels so hard to debug.

Now, that half my problem is dealt with, I only need to solve the other half:
How can I stop the compilation process to exclude both the cubin and the PTX from the generated object files (or the executable file). (Looking for windows-tools doing exactly that was not successful, and I hope I won’t have to write such a tool myself.)

Chris

The second approach I mentioned to me seems to be the mainstream approach.

  1. Use the vectorAddDrv project (for example)
  2. Note that the (host) source code for that project (e.g. .cpp files) have no kernel information in them.
  3. That project loads a ptx file from disk and converts it at runtime (PTX jit) to a binary executable
  4. Instead, you can take the existing ptx file and compile it to a cubin. At this point, you could discard the ptx file.
  5. You would then need to modify the main project (which has neither ptx nor cubin involved) to load a cubin instead of a ptx kernel.

I tried that, and it worked.
cuobjdump has verified that
- “File ‘vectorAdd.obj’ does not contain device code”
- “File ‘vectorAdd_nvrtc.exe’ does not contain device code”

I didn’t see that from the previous message, but now I got it.
Thanks for your patience to teach me.

Chris

I shouldn’t really have mentioned the nvrtc project. It was not really responsive to your question. Sorry for the confusion.

Never mind, that was only a small complication.

Robert’s solution DID work. I carefull looked at the build-artifacts the example left behind. Neither PTX nor cubin was found, as I hoped.

However, I wasn’t able to reproduce that.
I took the working sample-build, and my own build on separate monitors and compared the project-properties side by side, line by line. I compared each configuration setting. I made mine either exactly identical, or absolutely corresponding.
Finally, hitting the build-butten was disappointing: my generated artifacts had the cubin right included again.

(Could this be a consequenc of how the progam works instead how the build is setup? (Feels unlikely))

-I also tried something else: I used the “IDA Pro” editor to physically remove the segments with cubin. Only, the modified binary did not work.

-Due to other reason’s (likely but not certainly to be my own wrong-doing), I failed to set up a build with MSBuild.

-I tried “view -> Other windows -> command window” but that didn’t even let me “cd” to my directorey.

-I tried to “command-line” the instructions from the configuration properties in a command tool, only to find out these looked like argument settings, not like commands.

What should I try next?

From a project structure standpoint, the vectorAddDrv project has all device code separated into a file of its own. If your project is not similar in that regard (for example if you have source or PTX embedded as e.g. a string in a .cpp or .cu file) that is going to be an issue.

Your device code should be separated into their own files. They can be converted to cubin in a completely separate project. Other than knowledge of the file name to load itself, the “main” project need have no knowledge or any representation of the device code.

So do this.

  1. Separate your device code (kernels) into their own file(s). Just like as is done within the vectorAddDrv project.

  2. Remove those files from your project directory structure. Put them somewhere else on your disk, hidden away from your project build system. The only thing your code should need is filenames, just like the vectorAddDrv projct.

  3. Build your project. If things break, track down the errors to find out how the device source code is finding its way into your executable binary. keep resolving errors until your project builds correctly.

  4. Take your newly built project executable, and drop it into the directory where your device source code files are. If need be, convert your device source code to cubin files.

  5. Run your project.

A project executable built in that fashion could not possibly have any device code characteristics (other than file name) discoverable from the executable.

Thank you for making this now crystal clear to me.