Compiling through nvcc versus JIT driver compilation

luisgo667 · April 21, 2021, 11:01pm

Dear All,

I am getting (far) lower runtimes with JIT driver compilation. Is that because the driver is a newer version than the required by the CUDA version? If the driver is a match with the CUDA version do I get the same runtimes?

Thanks,

Luís Gonçalves

Robert_Crovella · April 21, 2021, 11:13pm

generally speaking, I would expect the ptxas function in the driver that shipped with a particular toolkit to roughly match the ptxas tool that is shipped with that toolkit. So JIT vs. compile shouldn’t matter much in that scenario. I’m sure there are other factors that could be involved in your observation.

njuffa · April 21, 2021, 11:22pm

Based on historical experience, the ptxas in the online compiler and the pxtas in the offline compiler are pretty much never in perfect sync. But they should be close, and so should the generated code, as pointed out by Robert Crovella.

It is possible, but fairly unlikely, that ptxas differences between online and offline compilation lead to noticeable performance differences. It is more likely that the root cause of performance differences comes down to user configuration between the two compilation modes. Check the compiler switch settings, e.g. use of --use_fast_math.

luisgo667 · April 21, 2021, 11:50pm

I am not using use_fast_math. With use_fast_math I get worse time because other optimizations are forced to turn off.
Driver Version: 460.67
Reduction in RunTime: 1->0.79 (CUDA → JIT)
Ubuntu 20.04, Nvidia on Demand. Display with Intel CPU internal GPU.

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0

njuffa · April 21, 2021, 11:57pm

I am not sure what you mean. What other optimizations conflict with --use_fast_math in your use case?

In any event, I mentioned --use_fast_math as one example of a compiler switch that is often relevant to performance. Have you checked for any defines that may differ between you online and offline builds? Are there any differences in the kernel launch parameters? Any differences in metrics collected via the CUDA profiler? Have you double checked the robustness of the performance measurement framework?

Without knowing the code, the compiler switches used, and the target GPU, I can only speculate wildly. I assume you use a controlled experiment, where all hardware and software stays exactly the same, and only the manner of compilation (online vs offline) changes.

luisgo667 · April 22, 2021, 12:04am

GTX 1650, 1024 Cores

I only measure the time with precision at beginning and at the end, to determine the runtime.

Topic		Replies	Views
Compiling through nvcc versus JIT driver compilation CUDA Programming and Performance	0	320	April 19, 2021
Driver JIT compilation CUDA Programming and Performance	6	4342	September 9, 2016
How to speed up JIT compilation? CUDA Programming and Performance cuda	4	1215	December 24, 2021
JIT Details CUDA Programming and Performance	14	3253	January 9, 2018
JIT .cu CUDA Programming and Performance	17	8064	October 13, 2010
Development using only runtime CUDA compilation (nvrtc) vs compile time CUDA compilation (nvcc) CUDA Programming and Performance	2	1939	November 26, 2019
CUDA Expression Templates and Just in Time Compilation (JIT) CUDA Programming and Performance	1	1821	April 9, 2013
Yikes, bad computation results with CUDA 7.5 release driver CUDA Programming and Performance	3	1083	September 30, 2015
What is the difference between runtime and driver API? CUDA Programming and Performance	8	15872	August 28, 2016
Google gpucc vs. Nvidia nvcc? CUDA Programming and Performance	8	6574	April 26, 2016

Compiling through nvcc versus JIT driver compilation

Related topics