[OptiX 7] About rays switching lanes/threads

From the OptiX 7 guide:

and also from chapter 6.1 “Program Input”.

I have a hard time unterstanding, what a ray switching lanes would involve in regards to the memory.
In my example, rays have some sort of identity, which I’m storing in global memory, which I access with a threadIndex:

const int ix = optixGetLaunchIndex().x;
const int iy = optixGetLaunchIndex().y;
const cuuint32_t threadIndex = ix + iy * optixLaunchParams.simConstants.size.x;
rayData = optixLaunchParams.perThreadData.currentRayData[threadIndex]; //global memory, each thread/launch index has their own

In my raygeneration program I sometimes fetch this data, use it as PRD. For the post-processing/shading step I have global counting structures, which I increment with atomic instructions. I also save e.g. new starting positions again in the global memory.
My main goal is to load data from location A and save it back to location A.
Now I wonder how the memory really is effected in case a lane/thread switch happens. Which of the data can I assume is safe? What could be a problematic scenario?

The restrictions you cited are simply due to the single ray programming model in which the OptiX program domains need to be expressed.
How that is parallelized internally is abstracted and changed among OptiX versions depending on the scheduler implementation.
Therefore CUDA device code for these OptiX program domains may not use some programming mechanisms which would require explicit knowledge about which compute unit is doing what work exactly at which time. This information is not available to the developer.

None of these restrictions apply to native CUDA kernels you could run on the same data between OptiX launches for example, which is rather straightforward with OptiX 7 because all data is defined via CUDA device pointers anyway.

If you use the optixGetLaunchIndex() function to index your buffers, everything is fine. That index is independent of how OptiX schedules the work.

The optixGetLaunchIndex() returns an uint3 vector.
I would write the code like this:

const uint2 launchIndex = make_uint2(optixGetLaunchIndex());
const unsigned int bufferIndex = launchIndex.x + launchIndex.y * optixLaunchParams.simConstants.size.x;
rayData = optixLaunchParams.perThreadData.currentRayData[bufferIndex];