Paged array on device

The situation is like this. We have a page array structure on host side, and we want to use a similar structure on device side. We have a class that keeps tracking on all page info on device side. An object of that class is created on device, and its pointer is passed into OptiX through Variable object.
On device side:

rtDeclareVariable(void *, pagedVertexArrayPtr, , );
RT_PROGRAM void mesh_intersect(int primIdx)
    PagedArray pagedVertexArray(pagedVertexArrayPtr);
    float3 p0 = pagedVertexArray.getPoint(p0_Index);
    float3 p1 = pagedVertexArray.getPoint(p1_Index);
    float3 p2 = pagedVertexArray.getPoint(p2_Index);
// We use PagedArray in bounding box program as well.

On host side:

void *devPtr = getPagedArrayDevicePtr();
geometryNode["pagedVertexArrayPtr"]->setUserData(sizeof(void *), &devPtr);

Now the problem comes, the content of the paged array is messed up. The device class has been verified in CUDA kernel and it works correctly. Do we do operations that might not be legit in OptiX?

Thank you.

Just a quick check: are you actually storing the device pointer with

void *devPtr = getPagedArrayDevicePtr();
geometryNode["pagedVertexArrayPtr"]->setUserData(sizeof(void *), &devPtr);


Because you might be passing to the device a pointer to a pointer, i.e. a pointer to the host-side pointer.

devPtr is a device-side pointer for sure.
setUserData(RTsize, void *) is a c++ wrapper around rtVariableSetUserData( RTvariable, RTsize, void *ptr)
The ptr param should be the host pointer to the value you want to pass, which is a device pointer. Is there anything wrong?

The functions are expecting a pointer, you’re passing &devPtr which is actually a pointer to a pointer (i.e. you’re adding another level of indirection). That might cause undefined behavior and explain your problem. Just guessing though

ptr parameter in rtVariableSetUserData() is a host pointer pointing somewhere in host memory space. In this case, that is a device pointer pointing somewhere in device memory space. When rtVariableSetUserData() is called, it goes to host address and copy whatever length of data specified by second parameter starting from there. In this case, the device pointer is copied to device side and saved in variable “pagedVertexArrayPtr,” then OptiX program uses it. This is how I expect rtVariableSetUserData() should work. Let me know if it is not how it works.

Okay I see what you’re trying to do and seems correct (at least in the way you’re storing data). A problem may be that: you’re actually writing kernels with your intersection and bounding box programs.

All rays which will be intersecting your geometry node will be calling that code and creating each one a PagedArray object, each one (possibly) setting parameters and each one reading concurrently from device (even from the same) memory with no synchronization whatsoever. There might be a race condition involved here, although I don’t know how your class is written.

PagedArray does two things. It keeps an array of pointer to each page(only one copy though), and it translates global index to page number and index into that page. It doesn’t modify data on device side. There shouldn’t be a race condition in this case, should there?

One interesting thing caught my attention is that when I rtPrintf the “pagedVertexArrayPtr,” I found the value is not the same as the one I got from host side. The one on device side is 0x240200 but the one on host side is 0x400240200. It seems that the device pointer has been truncated during the data transferring process in rtVarialbeSetUserData(). Could it be the cause?

We develop on Windows 64. Our GPU is GTX670.

Does what we are doing hurt OptiX performance or simply not work?

Thank you.

A couple of things:

  1. Are you using a host pointer (a void* containing the device address you intend to use) or a device pointer (a void* pointing to a memory area allocated with cudaMalloc) in the rtVariableSetUserData ? In the first case it’s okay, in the latter case it won’t work

  2. Always use %p to output pointers in printf-like functions, if you use %x it will trim your value to 32 bit (

For now these are the ideas I’m getting. If you still experience the error just let me know.
Keep in mind that writing to a global value from multiple thread will end up in tears. Reading from multiple threads a single global value should work but it is strongly advised not to do that.

  1. I use a host pointer that contains device address.
  2. I use %p to rtPrintf() the device address.

I understand writing to global value from different threads might cause problem, but I don’t understand why reading would be a problem? Thank you!

It’s not a problem but is highly discouraged since it’s a bad practice and performance waster. You should take advantage of other CUDA techniques. By the way as I said it shouldn’t be a problem if you’re just reading.

I tried writing a pointer in a CUDA location and referencing it the same way you described and found no problems (I did this inside the intersection program). Perhaps there’s something else going on inside your class? Did you verify the pointer before submitting it into the class and after doing that? What were the results?

Is this also being read and not written to in the bounding box program?

I’m not saying this is not an OptiX bug, I’m just trying to verify all scenarios that come to my mind before considering that possibility.

Try casting the values to a size_t and then seeing if you have truncation issues. Also, you can use cuda’s printf for sm_20+ compiled codes in addition to rtPrintf.

Thank you for all the response.
It ends up error on our side. We specified the wrong element type for the class template.
For truncation, it was that rtPrintf() didn’t print the value correctly, but actually pointer was correct. The upper 32bit was been zero out.