A couple of times I’ve asked questions here and I got some pretty nice answers, so let me ask just one more :)
I have one huge-huge CUDA kernel. Depending on the input, it may need or it may not need some of its code.
Since the fight with the registry spills (believe me, I’ve tried every single advice I got to reduce them) is ongoing, I just thought there may be a different way.
What I want is, to detect which parts of the code are needed and create kernel “on the fly” with just them (or replace the ones that are not used with empty ones). Of course, none of them will inlined, but this is fine.
If all of my users have nvcc installed this is not a problem (I will #ifdef the functions that are not needed, etc). However, CUDA SDK is quite big and the last thing I want is to bring it with my app.
If you have seen some kind of reference/manual/topic about it or if you know a way this can be done (with relatively small memory-space footprint for the users of the app), please share. Like, is it possible to combine multiple ptx files in fatbin on-the-fly ?
p.s. I dont have any cpp code in the kernel, if this matters …