passing function pointers between kernels

sirpalee · August 2, 2011, 1:58pm

Hello guys!

I’m trying to put together a program, what relies on gpu computing, but we should be able to extend it with external dlls / ptx(cubin) files. Compiling everything into a single executable / kernel is a nightmare, and makes development extremely slow (slow compile times, I can’t compile specific modules etc)…

So all I want to do is to define some behavior in c++ file, compile it to a ptx / cubin file and load that code into my main kernel. (if possible I don’t want to generate ptx code myself, I’m not familiar with code generation…)

So the basic structure looks like this

file1.ptx

someFunction
file2.ptx
someKernel

And someKernel calls the someFunction from the file1.ptx. I have tried to extract the function pointer to the someFunction (it is 128 actually) and pass it to the someKernel and using the driver api, but that doesn’t work.

So that means my function pointers are only valid for the kernel execution?

If yes is there some easy way to merge these two ptx files using the driver api functions, so I could use someFunction in someKernel? (and they shouldn’t take much long, since instant, or nearly instant code loading is important… nvcc works great, but with highly complex code the compilation is very slow)

If not, what happens if I manually merge the two ptx files, and copy the someKernel code to other ptx file, and get the function pointers from that module? I am afraid that would screw up my registers, right? (I’m using a lot of recursive calls, so I have a lot of stack available to the functions, can they manage that automatically?)

Cheers,
Pal.

sirpalee · August 4, 2011, 9:24am

Anyone?

sirpalee · August 19, 2011, 9:56am

Up again.

I don’t think that no one wanted to use dll-s or external ptx files to extend their kernels External Image

tera · August 21, 2011, 6:39pm

A few people desire to do this, but apparently nobody figured out how to circumvent the memory protection that prevents this from working.

sirpalee · August 22, 2011, 8:39am

Can you please explain what is that memory protection? (you mean I can’t access the memory where the functions defined?)

alrikai · August 22, 2011, 6:58pm

Just for clarification, is your someFunction in file1.ptx a regular cpp function, or is it a cuda kernel? i.e. are you trying to call a regular, host-based function in a ptx file from a kernel in a separate ptx file?

sirpalee · August 23, 2011, 8:44am

It is a device function, a simple function what does something with my data. It is like extending a closed source application with your own c++ dll-s, but with the application being a complex cuda kernel, and the dll a device function.

Or something what OptiX does, you are able to extend the optix kernel with your own shaders, geometry objects etc… But if possible, I want to avoid taking their path (recreating the whole kernel on the fly). I have been able to do something like this with simple function pointers, it works great, but since external linkage is not supported in kernels (though I can understand why) and It seems that function pointers only valid for the actual kernel.

I havent been able to use a kernel, what sets a function pointer in the memory and call that from a different kernel. Of course Im talking about ptx files, and using the driver api it works with the runtime api, but all of the kernels are need to be in the same cu file… (I have tried using multiple cu files and passing different pointers between them but it didn`t worked)

avidday · August 23, 2011, 9:11am

device functions don’t currently get external symbols in the host object files emitted by the CUDA toolchain, so there is no way currently to do what you are asking for.

sirpalee · August 23, 2011, 10:00am

And if I parse multiple ptx files together? (I mean doing a much more lightweight ptx generation than OptiX)

avidday · August 23, 2011, 10:55am

PTX 2.x supports indirect call via pointer, so it might be at least theoretically possible, but it would require extremely careful design. You would probably have to use inline PTX in your CUDA code to implement the function call, because I very much doubt the compiler could be persuaded to generate indirect calls via anonymous pointers. Even then I would be highly skeptical it would work without completely turning inline compilation off in the compiler, which could have pretty major performance implications otherwise.

Please post your working proof of concept when you get it going so we can have a look at it…

sirpalee · August 23, 2011, 11:40am

Thank you for the tips! I don’t think using inline ptx would be that complicated, but turning off inline calls is pretty harsh… Can I do that using pragma-s only in some parts of the code?

avidday · August 23, 2011, 11:51am

To the best of my knowledge, inline calls are all or nothing, and only controllable by the -Xopencc=“-INLINE:=off” option to nvopencc. So you either have inline calls, or you don’t. But it might still be possible have inline function expansion and use inline PTX for the calls where you will (somehow) provide the anonymous function pointer after the PTX has been generated. This is all really at the outer edge of what is documented and how things really work, so best of luck with it.

sirpalee · September 16, 2011, 4:07pm

So here is the reply, about what I found out (sorry for the late one, I had a lot of other tasks recently).

ATM I’m compiling the whole code into multiple ptx files, and parsing them together with the main kernel (in the external ptx files I’m using a small kernel to pass the function pointers at program init), and passing that to the JIT. It works, though not the best solution… It works with simple cases, but in more complex ones I do need to modify some part of the code (renaming constants and so on…). This way I don’t need to write my own ptx generation, and I can rely mostly on nvcc.

Pal.

Topic		Replies	Views
smart ideas for an interesting problem CUDA Programming and Performance	21	9534	December 10, 2008
Call inline ptx function? CUDA Programming and Performance	5	2165	June 19, 2012
Problems with hand-made PTX and driver API Difficulty getting a simple hand-written PTX program to w CUDA Programming and Performance	13	3198	October 12, 2011
CUDA: How to run cuda kernel functions not in SDK? Compilation issues CUDA Programming and Performance	7	6658	December 2, 2009
Going to learn PTX and write a GPU compiler CUDA Programming and Performance	20	26853	January 19, 2009
How to get a function pointer from compiled PTX module and pass to a kernel OptiX	4	1058	June 14, 2022
linking hand-coded PTX CUDA Programming and Performance	4	4415	August 31, 2007
Noob Q: How to extern c function? CUDA Programming and Performance	19	23613	June 30, 2010
Options for sharing cuda code between program groups / pipelines OptiX	3	610	January 18, 2023
Dynamic Kernel Function Runtime code generation CUDA Programming and Performance	17	25677	March 26, 2013

passing function pointers between kernels

Related topics