PTX coding runtime gains over CUDA coding

luisgo · April 17, 2016, 10:54pm

Dear All

 What is the expected runtime speedup gains of PTX direct coding over CUDA coding on complex Kernels?

Thanks

Luis Gonçalves

njuffa · April 18, 2016, 1:23am

Impossible to say for the general case. You may encounter either speedup or slowdown compared to plain C++ code. Inline assembly on any platform likely has higher creation and maintenance cost compared to C++, so I usually advise against dropping down to inline assembly unless faced with a situation where the desired functionality is difficult to express efficiently at the C/C++ level due to limitations in the abstract execution model of the HLL.

For your particular use case, have you already taken into account all relevant profiler feedback and exploited all the techniques suggested by the Best Practices Guide to improve the C++ code?

luisgo · April 18, 2016, 12:42pm

I just want to know about pratical cases. THe theory I already know.

njuffa · April 18, 2016, 2:21pm

My comments above were purely based on practical experience, no theory of any kind was considered. If you want to re-phrase the questions to: “What is the maximum speedup you have observed on kernels of any kind from using inline PTX instead of C++?” then the answer would be “about 30%”.

A PTX assembly language coder of average skill is unlikely to best the compiler on any non-trivial kernel, unless it involves functionality that is poorly expressible in C++. The CUDA compiler uses a derivative of LLVM in the frontend, and from looking at generated code it is clear that LLVM incorporates some very sophisticated optimizations, high-level as well as low-level.

luisgo · April 18, 2016, 2:27pm

Thanks. It is that kind of answer I was looking for.

Topic		Replies	Views
Inline PTX Assembly CUDA Programming and Performance	0	2561	August 10, 2010
Inline PTX assembly example CUDA Programming and Performance	1	14805	August 3, 2010
Some problems with inline PTX CUDA Programming and Performance	6	1896	March 6, 2013
Assembly Optimization CUDA Programming and Performance	2	4435	May 25, 2009
asm inlining in CUDA code? CUDA Programming and Performance	5	6545	July 19, 2010
Crowd sourcing request: help me time the PTX ISA. CUDA Programming and Performance	8	2019	July 2, 2019
C vs PTX CUDA Programming and Performance	3	2724	August 18, 2021
What is the reason why performance deteriorates when PTX code written with pipeline considerations is repeatedly used? CUDA Programming and Performance	4	427	April 28, 2023
ptxas compiles my program wrong CUDA 4.0RC2 CUDA Programming and Performance	2	4528	May 8, 2011
Cublas Vs inline PTX matrix multiplication CUDA Programming and Performance	0	374	September 29, 2021

PTX coding runtime gains over CUDA coding

Related topics