[Optix 6.5] Use of thrust on optix buffer

esteban.egea · January 23, 2020, 3:23pm

Hi all,
System info: Ubuntu 18.04, g++ 7.4.0, Optix 6.6.0, driver 440.44, cuda 10.0, 2xRTX 2080 Ti

I use thrust to process the results after a launch. I basically store the hits on a
RT_BUFFER_INPUT_OUTPUT with RT_BUFFER_GPU_LOCAL with a custom struct.
After the launch, I would like to manipulate the contents of the buffer with thrust, like, for instance

HitInfo* raw_ptr =static_cast<HitInfo*>( hitBuffer->getDevicePointer(0));
thrust::device_ptr<HitInfo> dev_ptr=thrust::device_pointer_cast(raw_ptr);
thrust::partition(dev_ptr, dev_ptr+bsize, hit_on_curved());

Now, code like the one above only works if I force to use only one GPU. If I use multiple GPUs thrust raises a thrust::system::system_error
what(): partition failed to synchronize: an illegal memory access was encountered.

On the contrary, if I copy the results from the optix buffers to another thrust::device_vector with cudaMemcpy, all the thrust algorithms run without problems on it.

So, the questions are, why is it happening? is it possible to use thrust on a optix buffer in general in a multi-GPU environment?
I guess that the answer is possibly in sect. 9.2.3 Zero-copy pointers of the documentation, but I do not get it. As far as I understand, since I use RT_BUFFER_GPU_LOCAL, it does not apply, does it?

Thanks a lot,

Kind regards

dhart · January 24, 2020, 8:30pm

Hi,

First just a super basic sanity check – I assume you’re indexing your multiple devices correctly since you said the copying strategy worked, but just to double check – your code snippet uses hard-coded device index 0, but you are indexing multiple devices when using thrust, right?

One of the multi-gpu engineers here mentioned seeing OptiX sometimes failing to reset the CUDA devices it uses post-launch. Putting a call to cudaSetDevices() right before invoking thrust might help.

If cudaSetDevices() doesn’t help, I guess just triple-check whether the device pointers didn’t get mixed up somehow. One potentially confusing pothole here is on unix you may be getting a unified memory space that could cause cudaMemcpy() to transfer memory from one GPU to another without you realizing it, that might inadvertently mask the problem you’re having.

Perhaps the most straightforward solution to this would be to cudaMalloc your own buffers at the start, and use rtBufferSetDevicePointer() instead of rtBufferGetDevicePointer().

–
David.

esteban.egea · February 4, 2020, 5:54pm

Hi,
I can confirm that using cudaSetDevice(i) before calling thrust works perfectly.

I have not tried the other solution.

Thanks again for your help.

Kind regards,
Esteban