Writing a function in PTX? Need to hand-code a function in PTX

suhailrehman · September 6, 2008, 9:14pm

Hi Guys,

I’m trying to hand-code a function to be used in my CUDA project in PTX. I need to create a function to add two numbers of arbitrary precision (with carry) using the addc.cc (Page 41, PTX ISA Documentation 1.2) assembler operation, since I see no way that this operation is exposed in a level higher than PTX.

Of my limited understanding of PTX so far, one of the best ways to do this is to write a skeleton device function code implementing a basic add operation, and then hand-tune the corresponding PTX file to change the required add operations to addc.cc obtained using nvvc ptx generation and then proceed with the compilation.

Surely there must be an easier way to do this? Is there no way to access add with carry in higher-level CUDA?

alex_dubinsky · September 10, 2008, 12:41am

We’ve been groaning about inline assembly since forever.

Is adding two numbers all your kernel does? If so, it’s not too hard to write a PTX kernel and link it in. There’s an automatic run-time linking facility (ie, device code repository) that I explained here: [url=“http://forums.nvidia.com/index.php?act=ST&f=71&t=44562”]http://forums.nvidia.com/index.php?act=ST&f=71&t=44562[/url] and is mentioned in the docs. It was a year ago but hopefully still applies. If this is part of a larger kernel, then it’d probably be too much to rewrite and maintain it as assembly. You’ll just have to juggle uint64s I’d guess.

P.S. If you write your own PTX kernel, start from scratch. Look at nvcc -ptx output to get a feel for how it’s done, but then do it yourself. If you start with compiler-generated code you’ll have a mess to work with.

suhailrehman · September 10, 2008, 4:50pm

Sadly, no. I need to write a device function that is to be called several times from a big kernel to add two arbitrary sized numbers together (eg. two 512 bit numbers). I wish to implement this by breaking down the additions into 32 bits each and using add with carry.

Any other ideas? I’m pretty sure using PTX in the first place is going to be a pain. I can no longer work with the comfortable make clean;make;execute routine.

alex_dubinsky · September 10, 2008, 5:37pm

Oh come on. You can put make clean, make, execute and any other build steps into a shell script and even save yourself typing.

Anyway, emulating the carry bit using 64-bit integers should work out ok. It’s probably gonna use the add-with-carry instruction anyway and you’re looking at only maybe a 2x slowdown. That shouldn’t be bad, it’s a fast operation anyway.

Topic		Replies	Views
Integer carry chains CUDA Programming and Performance	5	8759	November 18, 2010
linking hand-coded PTX CUDA Programming and Performance	4	4533	August 31, 2007
Available PTX assembly instructions CUDA Programming and Performance kernel	2	552	September 23, 2021
why CUDA 2.0 does not expose all PTX ISA 1.3 ? CUDA Programming and Performance	20	27966	November 5, 2008
PTX addition with carry in carry out instructions and for loops CUDA Programming and Performance	3	2011	January 10, 2014
Problems with hand-made PTX and driver API Difficulty getting a simple hand-written PTX program to w CUDA Programming and Performance	13	3395	October 12, 2011
Call inline ptx function? CUDA Programming and Performance	5	2309	June 19, 2012
Inline PTX assembly example CUDA Programming and Performance	1	14817	August 3, 2010
Going to learn PTX and write a GPU compiler CUDA Programming and Performance	20	27144	January 19, 2009
Integrate PTX code in compilation chain CUDA Programming and Performance	1	650	June 20, 2016

Writing a function in PTX? Need to hand-code a function in PTX

Related topics