Indirect Jump

bakks · September 30, 2011, 5:41am

I’m trying to do an indirect jump in a kernel. I tried this with old hardware (The Official NVIDIA Forums | NVIDIA) and found that it was impossible, but it is my understanding that Fermi can handle indirect jumps, and reading through the PTX 2.1 documentation seems to confirm this.

However, I can’t get it to work. Trying the code below I get no warnings (other than not using x), but the compiler segfaults. This sort of thing is pretty easy to do in gcc, any idea how to make it work in nvcc? Is this a compiler bug?

__global__ void testkernel(int jump)

{

    int x = 0;

void *jumptable[3] = {&&label0, &&label1, &&label2};

    goto *jumptable[jump];

label0:

    x = 4;

    return;

label1:

    x = 10;

    return;

label2:

    x = 20;

    return;

}

njuffa · September 30, 2011, 6:22am

The compiler should not segfault, no matter what files you feed it. If this happens with the CUDA 4.0 toolchain, can you please file a bug against the compiler, attaching your repro case. Please also state the exact commandline used to invoke nvcc, and platform (Win32, Win64, Linux32, Linux64) used as this will help with repro on our side. Thank you for your help, and sorry for the inconvenience.

bakks · September 30, 2011, 10:20pm

Where can I submit a bug report? Any idea if the code above should work or if there is another way to call indirect jumps?

njuffa · September 30, 2011, 10:39pm

You can submit bugs from a link off the registered developer website. Sorry, I do not know whether the code should work, the syntax of the jump table initialization is not familiar to me. Do you know whether this is ANSI C/C++ or possibly a gcc extension?

bakks · September 30, 2011, 11:32pm

This is a gcc extension http://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html

I have a switch statement I’m attempting to optimize with a jump table. I’ve looked at the PTX generated from my kernel and from what I can tell (based on the PTX documentation for the bra instruction) it is not automatically optimizing the switch with indirect jumps, so I’m attempting to handle it explicitly.

From the PTX 2.1 manual:

I’m attempting to access this instruction interface through CUDA, and my code above is the best guess I have for doing so. Any other ideas would be appreciated.

Gregory_Diamos · October 1, 2011, 3:01am

I haven’t tried indirect branches at the source code level, but you can implement a jump table using function calls. Here’s an example:

typedef Value (*GetOperandValuePointer)(const ir::Operand&, CoreSimBlock*, unsigned);

static __device__ GetOperandValuePointer getOperandFunctionTable[] = {

    getRegisterOperand,

    getImmediateOperand,

    getPredicateOperand,

    getIndirectOperand

};

static __device__ CoreSimThread::Value getOperand(const ir::Operand& operand, CoreSimBlock* parentBlock, unsigned threadId)

{

    GetOperandValuePointer function = getOperandFunctionTable[operand.mode];

return function(operand, parentBlock, threadId);

}

I’ve tried this with NVCC 4.0 and it compiles and executes without any issues.

bakks · October 3, 2011, 5:58am

That works, thanks for your help. Unfortunately it turns out that using a table of function pointers is slower for me than using a simple switch statement. From what I can tell looking at the ptx for the device functions, some arguments are written to local memory, and it seems that this overhead eclipses the advantage of the jump table. Do you have any ideas about this? Obviously this isn’t an issue with inline functions, but using a jump table seems to make inlining impossible. It would be nice if you could declare register variables that were shared between the kernel and device functions, but as far as I know this isn’t possible either. It looks like a switch statement is still my best option.

Topic		Replies	Views
Dynamic Branching in CUDA CUDA Programming and Performance	11	6091	December 29, 2008
Custom CPU to GPU ringbuffer CUDA Programming and Performance	21	13763	May 14, 2013
Why are function pointers so slow ? CUDA Programming and Performance	8	6011	June 4, 2013
NVCC bug ? unspecified launch failure CUDA Programming and Performance	1	1860	June 1, 2012
Fermi speculation Kernel invocation in kernel code CUDA Programming and Performance	10	4294	October 20, 2009
PTX indirect branching question Does "indirect branch is currently unimplemented" mean it wo CUDA Programming and Performance	10	1952	June 8, 2013
ptxas internal error branching on register - internal error? CUDA Programming and Performance	7	1132	October 11, 2010
Dynamic Kernel Function Runtime code generation CUDA Programming and Performance	17	25697	March 26, 2013
NVIDIA modules 470.239.06 build failure with gcc-14 due to conftest.sh Linux	3	1287	May 17, 2024
How does reducing unrolling or branching code actually reduce instruction fetch? CUDA Programming and Performance	16	2737	December 4, 2016

Indirect Jump

Related topics