Does `cuModuleLoadDataEx` always compiles relocatable code PTX?

Hi all,
I compile PTX with cuModuleLoadDataEx. I do wonder does it always generate relocatable code (aka rdc=true)?

It depends on the PTX. -rdc=true affects the creation of PTX. Furthermore, if any actual device code linking is required, that requires a particular sequence in the driver API. It’s not as simple as the non-rdc case.

Thanks, this part of the compilation is very hard to guess without your help.

I am also wondering that if compiling PTX with cuModuleLoadDataEx inlines a function? If no, we can always say rdc=true, do I think right?

here is a complete example.

According to my testing, if you have not created the PTX using -rdc=true, then the function may get inlined by the compiler front-end already by the time the PTX is generated. It’s easy to write a simple test case to demonstrate this for yourself.

If the PTX has evident non-inlined functions, then I suspect the remainder of the compile sequence (that which is controlled by ptxas or cuModuleLoadDataEx similarly) would inline the function if it thought best to do so, for example if you had not specified relocatable device code and device linking at that stage.

No, the general idea of -rdc=true is the opposite of inlining. -rdc=true expresses to the toolchain that you may, for example, need to call a __device__ function (or PTX function) from another compilation unit. In that case the function cannot be inlined. However for the case where the function is used in the same compilation unit, I would still expect non-inlining when -rdc=true is specified but that particular aspect may be a implementation detail.

You can write simple test cases to probe this yourself.

In my case, the CUDA kernel and the CUDA function are generated by different compilers as PTX, so the I cannot leverage CUDA frontend.

The function and kernel are in the same ptx file. How can I force ptxas inlining in this level?

I cannot see a inline keyword in PTX model. So I am guessing that if there is a device function in PTX, and ptxas does not inline that, your rdc is always true.

My questions here, how enable __forceinline__ kind of thing with ptxas, so I have only kernel and my rdc will be false.

Thank you