Hi,
I am trying to find out how I’d be able to call an inline ptx function. So if you e.g. had code like this:
.func (.reg .s32 %res) inc_ptr ( .reg .s32 %ptr, .reg .s32 %inc )
{
add.s32 %res, %ptr, %inc;
ret;
}
call (%d), inc_ptr, (%s, %d);
How would you use it with inline ptx aka. asm()?
Hi,
I am trying to find out how I’d be able to call an inline ptx function. So if you e.g. had code like this:
.func (.reg .s32 %res) inc_ptr ( .reg .s32 %ptr, .reg .s32 %inc )
{
add.s32 %res, %ptr, %inc;
ret;
}
call (%d), inc_ptr, (%s, %d);
How would you use it with inline ptx aka. asm()?
I asked myself similar questions: how to efficiently use ptx? I found these resources helpfull:
http://blog.langly.org/2009/02/12/cuda-hacking-ptx-code/ <— this one is how to write and launch Kernels in ptx
http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/Using_Inline_PTX_Assembly_In_CUDA.pdf ← This one details how to write inline ptx.
http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/ptx_isa_3.0.pdf <— 5.1.6 explains how to call kernels ptx to ptx and 7 fct declarations
although I don’t have a cool answer (and would be interested in one) to the question how to call a ptx. function from cuda, I hope the given references help you to solve your problem. The “using inline ptx” by nvidia should solve your prob directly wiht the others beeing more the kind of “background reading”.
If there is a better solution to compile and call a fct. completely written in ptx, I’m also interested reading about it.
Thanks for your reply. I have looked through the resources before, and was at the time unable to find any direct information about inline assembly ptx functions. I was sort of hoping someone could give me an example of how to do it :). I was thinking it should be something along the lines of:
// outside function scope
asm volatile(".func (.reg .s32 %res) inc_ptr ( .reg .s32 %ptr, .reg .s32 %inc )/
{/
add.s32 %res, %ptr, %inc;/
ret;/
}" ::);
//inside function scope
__global__ void foo()
{
int v = 0;
asm("call (%0), inc_ptr, (%1, %2);" : "=r"(v) : "r"(1), "r"(2));
}
But the asm outside function scope fails compiling. Anyway, I hope someone can help me out.
As MKasper also mentioned, I guess any way to call the ptx function from CUDA-C would of course do (e.g. without inlining).
If you ptx function is simple you can do something like
// outside function scope
#define addptx(r,a,b) asm("add.s32 %0, %1, %2;" : "=r"(r) : "r"(a), "r"(b));
//inside function scope
__global__ void foo()
{
int v = 0, a=1, b=2;
addptx(v,a,b);
Cyril
If you ptx function is simple you can do something like
// outside function scope
#define addptx(r,a,b) asm("add.s32 %0, %1, %2;" : "=r"(r) : "r"(a), "r"(b));
//inside function scope
__global__ void foo()
{
int v = 0, a=1, b=2;
addptx(v,a,b);
Cyril
Thanks for your answer Cyril, but I am mainly looking for a way to call a ptx function (.func) from CUDA-C