Concurrent nvrtc ptx compiling and runtime linking

ratzes · June 19, 2020, 1:11am

How should I go about compiling and linking multiple cuda kernels concurrently at runtime with nvrtc and cuLinkAddData?

I’m using a shared context between different threads and it seems to be serializing everything (So each nvrtc compilation/linking takes 70ms, 100 concurrently takes 7.5s on a 24 core system)

Making a context for each thread seems to be a bad idea even with 10 threads.

I’m doing all this through an FFI, so I might be doing something wrong, but wanted to ask what I should do operationally first.

Topic		Replies	Views
Parallel compilation with NVRTC CUDA Programming and Performance	4	1072	February 28, 2024
Runtime compiling+linking CUDA Programming and Performance	2	487	August 10, 2023
Concurrent CUDA kernel scheduling on Fermi GPU's CUDA Programming and Performance	3	3685	February 16, 2011
Linking multiple cu file to generate a ptx CUDA NVCC Compiler	4	149	June 5, 2025
Support for multi-threaded apps on cuda and multiple applications on cuda CUDA Programming and Performance	13	12879	January 24, 2011
nvPtxCompiler still serializing? CUDA Programming and Performance	2	60	September 16, 2025
Cuda ( 4.1 or future), LLVM and linking CUDA Programming and Performance	0	6539	November 18, 2011
Concurrent 'launch' on same context possible!? OptiX	2	1222	June 14, 2022
Multiple host thread on a single GPU CUDA Programming and Performance	2	5267	February 10, 2012
TensorRT Parallel Inference /concurrent inferecing TensorRT tensorrt	10	4397	October 13, 2022

Concurrent nvrtc ptx compiling and runtime linking

Related topics