Runtime compiling+linking

cpdurham · August 10, 2023, 3:31pm

I’d like to compile many ptx programs at runtime on a multi core system. I know there’s the driver api for rt linking, but this requires a cudevice and from what I remember this is heavily serialized. Is there a faculty to create fatbins per thread in parallel?

It seems like this could exist as nvcc doesn’t require a device present and can generate fatbins, so should be able to compile per thread.

I guess I’m not sure why the ptx to optimized sass conversion has to be such a bottleneck. Any thoughts on options I missed?

Robert_Crovella · August 10, 2023, 3:40pm

The driver API generally (JIT) compiles code for the currently selected device. So the process is not completely disconnected from the GPU. Yes, I acknowledge it doesn’t seem like that should require a lot of interaction or serialization, and I personally don’t know the extent of serialization, but I have seen reports of it. Beyond that I wouldn’t be able to explain in detail the dependencies, serialization, or rationalization for observations.

Yes, you can create fatbins with nvcc. You can create a cubin format output from nvcc also which is directly consumable by the driver API without JIT compilation.

You can request changes or enhancements to CUDA behavior by filing a bug. In this case, you might be asked for a demonstrator and your observations. Short text descriptions like what is in your post at the moment might not gain traction without a demonstrator.

cpdurham · August 10, 2023, 7:01pm

Thanks Robert! Glad to know I sort of had the full picture and wasn’t missing something.

I’ll write up some tests and file a request

Topic		Replies	Views
Generate fatbins programmatically CUDA Programming and Performance	2	596	January 30, 2022
nvcc cubin for multiple platforms How can I produce CUBIN for all platforms? CUDA Programming and Performance	4	2455	January 8, 2011
Runtime Fatbin Creation Using the NVIDIA CUDA Toolkit 12.4 Compiler Technical Blog	2	119	June 18, 2024
Runtime linking CUDA Programming and Performance	0	523	January 31, 2015
How do I call the ". fatbin" file through "cuda c"? CUDA Programming and Performance	1	677	March 22, 2023
Deployment deploying CUDA CUDA Programming and Performance	12	12963	November 4, 2008
Parallel compilation with NVRTC CUDA Programming and Performance	4	1002	February 28, 2024
Using PTX with Runtime Possible? CUDA Programming and Performance	4	1290	July 11, 2010
Create single .ptx (or .cubin) file from multiple .cu sources CUDA Programming and Performance	2	1851	February 10, 2016
dynamic parallelism with cuda driver api CUDA Programming and Performance	6	1877	January 7, 2015

Runtime compiling+linking

Related topics