Failed on optix::Buffer::setDevicePointer()

I am trying to get CUDA OptiX interop working.
I plan on allocating memory on CUDA and make OptiX Buffer use it.
My code is as follows:

Buffer vertex_buffer = m_context->createBufferForCUDA(RT_BUFFER_INPUT, RT_FORMAT_FLOAT3, nverts);
float3 *d_ptr;
cudaMalloc((void **)&d_ptr, sizeof(float3)*nverts);
cudaMemcpy( d_ptr, verts, sizeof(float3)*nverts, cudaMemcpyHostToDevice);
vertex_buffer->setDevicePointer(0, (CUdeviceptr)d_ptr);

I got an uncaught exception at setDevicePointer(). The error message is as follows:

First-chance exception at 0x000007fefd92cacd in sample.exe: Microsoft C++ exception: cudaError at memory location 0x001fdb20..

I have two cards on my machine. One is low-end NVS 300 and the other is GTX 670.
Since the major compute capability of those two cards doesn’t match, only GTX 670 should be used by OptiX. Even if I force device to be used by Buffer::setDevices(), it still doesn’t work.
Could the device memory address converting between Driver and Runtime API and issue?
I tried (CUdeviceptr)(uintptr_t) to cast on d_ptr, but still with no luck.
Any one have any idea on what might go wrong?
Thank you.

Please check if all cuda* calls returned successfully.
When explicitly selecting the device with rtContextSetDevices() resp. its C++ wrapper (once directly after creating the context) did you match the optix_device_number parameter in the rtBufferSetDevicePointer() call?

Does it work if you only use the GTX 670?
Assuming this is under Windows, you could test this by either disabling the NVS 300 inside the Windows Device Manager temporarily or by limiting the CUDA devices in the NVIDIA Control Panel maybe.

Related topic, this time using the OptiX buffer in CUDA:
[url]rtBufferGetDevicePointer is reporting null pointer - OptiX - NVIDIA Developer Forums

Thank you Detlef for responding.
I stripped out the error checking code while posting. All cuda* calls returned cudaSuccess. Even rtBufferSetDevicePointer() return RT_SUCCESS. It failed until OptiX launch call. The exception APIObj::checkError() caught was as follows:

Invalid context (Details: Function "_rtContextLaunch2D" caught exception: Cannot map CUDA interop buffers, [14614586])

The device number is matched. To confirm that, I would like to know if CUDA Device Number match OptiX Device Ordinal? I tried disabling NVS 300 in Windows Device Manager but still with no luck.
The driver is up-to-date. Is there any other part I might be missing?
Thanks.

Ok. I think I found the cause.
I noticed that implicit casting didn’t behave what I expected. The d_ptr returned by cudaMalloc was 0x0000000400720000, but (CUdeviceptr)d_ptr interpreted it as 0x0000000000720000. After applying reinterpret_cast on d_ptr, it worked.
Here is the working code:

Buffer vertex_buffer = m_context->createBufferForCUDA(RT_BUFFER_INPUT, RT_FORMAT_FLOAT3, nverts);
float3 *d_ptr;
cudaMalloc((void **)&d_ptr, sizeof(float3)*nverts);
cudaMemcpy( d_ptr, verts, sizeof(float3)*nverts, cudaMemcpyHostToDevice);
vertex_buffer->setDevicePointer(0, reinterpret_cast<CUdeviceptr>(d_ptr));

In the OptiX 3.0.0 Programming Guide, it said: “CUDA must be initialized using the CUDA runtime API; OptiX does not currently support interop with the CUDA driver API.” Why OptiX needs to interop with the Runtime API but still takes Driver API pointers as set/getDevicePointer input?

Now I have another problem when launching OptiX.
I got the following exception:

Invalid context (Details: Function "_rtContextLaunch2D" caught exception: Cannot map CUDA interop buffers, [14614586])

I initiate CUDA as follows:

cudaSetDeviceFlags( cudaDeviceMapHost | cudaDeviceLmemResizeToMax );
cudaFree(0);

Is there any version/hardware restriction or known bug related to OptiX CUDA interop?
Like what type of CUDA buffer it should be created to interact with OptiX…etc.
Thank you!

Maybe try if float, float2, or float4 instead of float3 work better. float3 is slower to load than float4 anyways.
I’m not sure about the OptiX to CUDA buffer interop, but for CUDA to OpenGL or D3D texture interop, 3-component texture formats aren’t supported. (Appendix A in the OptiX Programming Guide.)

I am wondering why it could be an issue. The buffer format is defined as RT_BUFFER_INPUT and it is copied from Host to Device through CUDA Runtime API. Shouldn’t the issue only exist when OptiX needs to copy to the Host memory?
By the way, before the “Invalid context” exception was caught, I also got the following two uncaught ones:

First-chance exception at 0x000007fefd63cacd in sample.exe: Microsoft C++ exception: optix::BasicException at memory location 0x0056c710..
First-chance exception at 0x000007fefd63cacd in sample.exe: Microsoft C++ exception: optix::Exception at memory location 0x0056d760..
Invalid context (Details: Function "_rtContextLaunch2D" caught exception: Cannot map CUDA interop buffers, [14614586])

Any clue?