I’m trying to adapt existing CUDA kernel code in PyTorch to add a remote backend using a disaggregated operating system, in order to dispatch the jobs, I’d need to decipher the mangling form used by nvcc to somehow map from calls of the form
Kernel<><<<>>> to their cubin symbol name. Any help on how to achieve this would be much appreciated, my current approach utilises the C++ function
typeid(Kernel<>)and trying to form some regular expression to map from this information to the symbol in the generated cubin by using
cuobjdump -symbols, any alternative approaches or systematic approaches for generating the regular expression would be greatly appreciated!
Thanks in advance,
Have you looked at the demangler NVIDIA ships with CUDA,
Thanks for the response, I have had a look at cu++filt, but this more so provides the inverse operation than what is wanted, I was hoping to find a direct solution as opposed to demangling the entire symbol table and creating the level of indirection this would add?
I was responding to this:
which led me to believe you are looking for a demangler.
The name mangling is performed by the compiler. If you generate appropriate function stub(s), compile the code, then extract the symbol(s) from the object file, that should do what you want, correct?
Or am I still misunderstanding the question?
I do not know where in the compiler the name mangling occurs. A part of the CUDA compiler, in particular the
nvvm component, is based on open source code (LLVM), so you might want to try and look at that to see whether you can find the code that performs the name mangling.
Apologies for my lack of clarity.
I suppose that the solution you suggest would do what I want, however, it would be a relatively onerous process. I was looking for a way at runtime, to convert from the CUDA kernel signature directly to its corresponding symbol. Which could be done, but would require essentially jitting the stub every time I want to perform a symbol lookup.
Thanks for the information regarding the
nvvm component, I’ll take a look into that to see if I can replicate its mangling form!
Note that the name mangling might no happen in
nvvm, that is conjecture on my part.
As far as I understand the internal compiler flow controlled by
nvcc, the source code is passed through EDG frontend components to
nvvm which produces PTX, then
ptxas compiles the PTX to SASS (machine code). The mangled names are visible in the PTX intermediate representation.
The name mangling may have already happened by the time
nvvm is invoked. I have no knowledge whether that is the case or not.