Inline ASM / laneid

I’m running into issues when I try to use the following inline PTX assembly in a closest-hit program:

int laneId; asm("mov.s32 %0, %laneid;" : "=r"(laneId) );

What I’m trying to do is use the thread’s lane index as a “swizzle” factor to reduce atomicAdd contention. The overall situation looks something like this:

rtDeclareVariable(int, nvals, , );

rtBuffer<float, 1> vals; //Accumulates data over every ray. (size==nvals)

RT_PROGRAM void closestHit() {
   int swizzle;

   asm("mov.s32 %0, %laneid;" : "=r"(swizzle) ); //Fails with segfault if I do this.
   //swizzle = launch_index.x; //No segfault if I do something like this instead.

   for(int idx=swizzle; idx<(nvals+swizzle); ++idx) {

      int wrappedIdx = idx % nvals;
      float someVal = 2.0f*wrappedIdx;

      atomicAdd(&vals[wrappedIdx], someVal);

   //... (recursively spawn child rays that use this same closest-hit program)

The program compiles fine, but execution fails with a segfault whenever I try to use the laneid as the swizzle factor. Other swizzle factors that I tried worked fine, so I know this isn’t a simple array indexing issue.

So here’s my question: is retrieving the laneid via inline asm not supported in Optix programs, and that’s why I’m getting this error? Or is this something I should create a bug report for? I’m using Optix 3.9 / Cuda 7.5 on Ubuntu 14.04 Linux (64-bit).

Thank you for your help.

Hi Stephen,

We’ve been discussing this through other channels, but a quick note for anybody else that finds this thread:

OptiX 3.9.0 doesn’t officially support grabbing special registers like %laneid. This code will work in some programs but not others. More often not. The built-in threadIdx CUDA variable is supported, although it may not map 1-1 to rays, as stated in the programming guide.

We’re trying to figure out the right recommendation for doing high performance buffer reductions going forward.