Optix GPU ray hits not matching Optix CPU ray hits

I am using the primeSimple example to simulate intersecting a single triangle with a single ray.

Here is my triangle :

int nVertex = 3;
float* vertex_data = new float[nVertex * 3];
int iVertex = 0;
vertex_data[iVertex++] = 0.0f; vertex_data[iVertex++] = 0.0f; vertex_data[iVertex++] = 0.0f;
vertex_data[iVertex++] = 1.0f; vertex_data[iVertex++] = 0.0f; vertex_data[iVertex++] = 0.0f;
vertex_data[iVertex++] = 0.0f; vertex_data[iVertex++] = 1.0f; vertex_data[iVertex++] = 0.0f;

int nTriangles = 1;
int* vertex_indices = new int[nTriangles * 3];
int iTriangleVertex = 0;
vertex_indices[iTriangleVertex++] = 0; vertex_indices[iTriangleVertex++] = 1; vertex_indices[iTriangleVertex++] = 2;

Here is my ray CPU code :

Ray* rays = raysBuffer.ptr();
float minValueOrMask = 0.0f;
Ray r = {make_float3(0.2f, 0.1f, 3.0f), minValueOrMask, make_float3(0.0f, 0.0f, -1.0f), 1e34f};
rays[0] = r;

Here is my ray GPU kernel code :

__global__ void createRaysKernel
(
  float4* rays,
  int useMasking
)
{
  int idGlobalThreadx = threadIdx.x + blockIdx.x * blockDim.x;
  int idGlobalThready = threadIdx.y + blockIdx.y * blockDim.y;

  float minValueOrMask = 0.0f;
  if (useMasking)
  {
    minValueOrMask = __int_as_float(1);
  }

  if (idGlobalThreadx > 0 || idGlobalThready > 0)
    return;

  int iRay = 0;

  // origin, tmin
  rays[2 * iRay + 0] = make_float4(0.2f, 0.1f, 3.0f, minValueOrMask);

  // dir, tmax  
  rays[2 * iRay + 1] = make_float4(0.0f, 0.0f, -1.0f, 1e34f);  
}

The hit results for using CPU (correct) :

i = 0;
hits[i].t = 3
hits[i].triId = 0
hits[i].u = 7.0000e-01
hits[i].v = 2.0000e-01

The hit results for using GPU (not correct) :

i = 0;
hits[i].t = 0
hits[i].triId = 0
hits[i].u = 7.0000e-01
hits[i].v = 2.0000e-01

The hits[i].t = 0 value for GPU is not correct. It should be 3.0, same as CPU.

Am I doing something incorrect?
ExampleForOptixDevelopers.tar.gz (2.77 KB)

I attached the full example. As noted before, it is nothing more than primeSimple with a single triangle and a single ray - a few minor mods. I would be grateful if someone can run the example and verify the results or tell me how it is incorrect. As it stands now, I cannot use the GPU ray tracer since the hit “t” value appears to be incorrect.

Please know I have found the issue, and as suspected it is something I have done. I now get both CPU and GPU results to agree on this simple example.