Call inline ptx function?

Hi,

I am trying to find out how I’d be able to call an inline ptx function. So if you e.g. had code like this:

.func (.reg .s32 %res) inc_ptr ( .reg .s32 %ptr, .reg .s32 %inc )

{

add.s32 %res, %ptr, %inc;

ret;

}

call (%d), inc_ptr, (%s, %d);

How would you use it with inline ptx aka. asm()?

I asked myself similar questions: how to efficiently use ptx? I found these resources helpfull:

http://blog.langly.org/2009/02/12/cuda-hacking-ptx-code/ <— this one is how to write and launch Kernels in ptx

http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/Using_Inline_PTX_Assembly_In_CUDA.pdf <-- This one details how to write inline ptx.

http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/ptx_isa_3.0.pdf <— 5.1.6 explains how to call kernels ptx to ptx and 7 fct declarations

although I don’t have a cool answer (and would be interested in one) to the question how to call a ptx. function from cuda, I hope the given references help you to solve your problem. The “using inline ptx” by nvidia should solve your prob directly wiht the others beeing more the kind of “background reading”.

If there is a better solution to compile and call a fct. completely written in ptx, I’m also interested reading about it.

Thanks for your reply. I have looked through the resources before, and was at the time unable to find any direct information about inline assembly ptx functions. I was sort of hoping someone could give me an example of how to do it :). I was thinking it should be something along the lines of:

// outside function scope

asm volatile(".func (.reg .s32 %res) inc_ptr ( .reg .s32 %ptr, .reg .s32 %inc )/

{/

add.s32 %res, %ptr, %inc;/

ret;/

}" ::);

//inside function scope

__global__ void foo()

{

int v = 0;

asm("call (%0), inc_ptr, (%1, %2);" : "=r"(v) : "r"(1), "r"(2));

}

But the asm outside function scope fails compiling. Anyway, I hope someone can help me out.

As MKasper also mentioned, I guess any way to call the ptx function from CUDA-C would of course do (e.g. without inlining).

If you ptx function is simple you can do something like

// outside function scope

#define addptx(r,a,b) asm("add.s32 %0, %1, %2;" : "=r"(r) : "r"(a), "r"(b));

//inside function scope

__global__ void foo()

{

int v = 0, a=1, b=2;

addptx(v,a,b);

Cyril

Thanks for your answer Cyril, but I am mainly looking for a way to call a ptx function (.func) from CUDA-C