Interop with Unity/D3D

I spend the last few days on this, slowly reading up on d3d and cuda interop.

The goal is to write directly to a GPU side texture from Optix. The underlying raycasting etc. works. So far I just copied it around manually and was hoping to gain some more performance this way.

After launching Optix, this is what I have:

if (resource == nullptr)
	{
		// state.vis_overlay_params.overlayUnityPtr is returned from Texture.GetNativeTexturePtr https://docs.unity3d.com/ScriptReference/Texture.GetNativeTexturePtr.html
		CUDA_CHECK(cudaGraphicsD3D11RegisterResource(&resource, 
			reinterpret_cast<ID3D11Resource*>(state.vis_overlay_params.overlayUnityPtr), 
			cudaGraphicsRegisterFlagsNone));
		
		//Get dest texture descriptor
		destTex = reinterpret_cast<ID3D11Texture2D*>(state.vis_overlay_params.overlayUnityPtr);

		destTex->GetDesc(&destDesc);
		// prints 1920, as expected
		state.UnityLogCallback(3, "Launch", std::to_string(destDesc.Width).c_str());
	}
	//Map resource
	CUDA_CHECK(cudaGraphicsMapResources(1, &resource, state.stream));
	CUDA_CHECK(cudaGraphicsSubResourceGetMappedArray(&array_ptr, resource, 0, 0));

	// bufferPtr is returned from sutil::CUDAOutputBuffer.map()
	// So this should copy my results to the d3d texture?
	CUDA_CHECK(cudaMemcpy(array_ptr, 
		bufferPtr,
		state.vis_overlay_params.overlayWidth * state.vis_overlay_params.overlayHeight * 4 * sizeof(float),
		cudaMemcpyDeviceToDevice));

	CUDA_CHECK(cudaGraphicsUnmapResources(1, &resource));
        output_buffer.unmap();

Interestingly I get nothing. No error, but also the texture isn’t changed. Any idea what I might be missing?

I’m not sure about the D3D part of the code because I’m using mostly OpenGL and Vulkan, but I’m using CUDA interop with Pixel Buffer Objects and texture arrays in my OptiX 7 applications as well, though mostly with the CUDA driver API, but the CUDA calls are similar.

I think you’re just using the wrong CUDA memcpy command to copy into the mapped texture array.
CUDA array targets are not stored linearly in the target memory so you cannot copy into them with a linear cudaMemcpy.

Please have a look into all CUDA interop calls inside the INTEROP_MODE_TEX clauses in this example code.
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/nvlink_shared/src/Device.cpp

Specifically the part which copies from a CUDA buffer into the CUDA mapped texture array here:
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/nvlink_shared/src/Device.cpp#L1598

The CUDA OpenGL interop distinguishes between images and buffers. Looks like CUDA D3D interop only has one method to register the resource.
Means your code should work when using the correct cuMemcpy2D or cuMemcpy3D resp. cudaMemcpy2DToArray or cudaMemcpy3D calls to copy from linear device memory into a CUDA texture array.

Awesome, thanks a lot. I did see the MemcpyXD functions and previously tried that, but it’s possible that something else wasn’t ok yet.

The SDK examples and Optix_Apps are my normal goto. I put that back in and go from there. Good to know that I am on the right track in general. I think what threw me off was that there was no error or other negative feedback.

Much appreciated!

Sadly, that didn’t help.

This is what I have right now. After launch and a synchCheck for good measure.
This crashes with an invalid memory access at cudaMemcpy2DFromArray()

Definitions:
cudaGraphicsResource_t resource = nullptr;
unsigned* overlayUnityPtr = 0;
ID3D11Texture2D* destTex;
D3D11_TEXTURE2D_DESC destDesc;
cudaArray_t array_ptr = nullptr;
bufferPtr = output_buffer.map();

if (resource == nullptr)
	{
		auto tex = reinterpret_cast<ID3D11Resource*>(state.vis_overlay_params.overlayUnityPtr);
		// state.vis_overlay_params.overlayUnityPtr is returned from Texture.GetNativeTexturePtr https://docs.unity3d.com/ScriptReference/Texture.GetNativeTexturePtr.html
		CUDA_CHECK(cudaGraphicsD3D11RegisterResource(&resource, 
			tex, 
			cudaGraphicsRegisterFlagsNone));		
		//Get dest texture descriptor
		destTex = reinterpret_cast<ID3D11Texture2D*>(state.vis_overlay_params.overlayUnityPtr);
		
		destTex->GetDesc(&destDesc);
		// prints 1920, as expected
		state.UnityLogCallback(3, "Width", std::to_string(destDesc.Width).c_str());
	}
	//Map resource
	CUDA_CHECK(cudaGraphicsMapResources(1, &resource, state.stream));
	CUDA_CHECK(cudaGraphicsSubResourceGetMappedArray(&array_ptr, resource, 0, 0));
	
	// Here is crashes with an invalid memory access, tried kind as DeviceToDevice, same result
	
	CUDA_CHECK(cudaMemcpy2DFromArray(array_ptr, 
		state.vis_overlay_params.overlayWidth * 4,
		reinterpret_cast<cudaArray_const_t>(bufferPtr),
		0,0,
		state.vis_overlay_params.overlayWidth * 4,
		state.vis_overlay_params.overlayHeight * 4,
		cudaMemcpyDefault));
	CUDA_CHECK(cudaGraphicsUnmapResources(1, &resource));

I also tried this, does not crash, but also doesn’t do anything besides that. The texture appears to be unchanged.

CUDA_MEMCPY3D params = {};
	params.srcMemoryType = CU_MEMORYTYPE_DEVICE;
	params.srcDevice = reinterpret_cast<CUdeviceptr>(bufferPtr);
	params.srcPitch = state.vis_overlay_params.overlayWidth * 4;
	params.srcHeight = state.vis_overlay_params.overlayHeight;

	params.dstMemoryType = CU_MEMORYTYPE_ARRAY;
	params.dstArray = reinterpret_cast<CUarray>(&array_ptr);	
	params.WidthInBytes = state.vis_overlay_params.overlayWidth * 4;
	params.Height = state.vis_overlay_params.overlayHeight;
	params.Depth = 1;
	
	CU_CHECK(cuMemcpy3D(&params));
	
	CUDA_CHECK(cudaGraphicsUnmapResources(1, &resource));

I am sure it’s nothing complicated, but I just don’t see it.

I’m not surprised if the first version crashes.
You said you wanted to copy from the linear OptiX CUDA output buffer (bufferPtr) into the D3D texture (array_ptr).
That should have used the cudaMemcpy2DToArray function.
bufferPtr cannot be cast to cudaArray_const_t. That’s the linear source buffer.
The default is to copy from host. You’d want to copy from device to device.
Your CUDAOutputBuffer should better be CUDAOutputBufferType::CUDA_DEVICE for that. There shouldn’t be a need to use that helper class for this case then. It’s just a simple CUDeviceptr you need to allocate and resize matching the state.vis_overlay_params width, height, and format somehow…

Now the cuMemcpy3d() call seems to use the correct direction and types.
If the output buffer width, height and format and all arguments to the cuMemcpy3d() are as expected, and if every CUDA call succeeded the CUDA_CHECK() tests (e.g. the texture format can be used in CUDA interop, etc.), then I cannot say why you do not see an updated D3D texture inside Unity.

Maybe implement a standalone D3D11 example without Unity to see if this generally works.
If not, provide that as reproducer project to be able to analyze it.
If yes, it’s something within Unity which behaves differently and I cannot help further.

Thank you for the input.

This has me stumped though. This seems to do the opposite? The d3d texture is 2d and my optix buffer is a linear array. So I want to copy from array to a 2d texture, where this is the other way around.

The name array_ptr isn’t great, but I am in a time boxed spike so I wanted to clean that up later.

array_ptr is mapped from the resource that was registered from the d3d texture.
The optix buffer is just a unsigned int array.

CUDA_CHECK(cudaMemcpy2DToArray(array_ptr, 
		0,0,
		&bufferPtr,
		state.vis_overlay_params.overlayWidth * 4,
		state.vis_overlay_params.overlayWidth,
		state.vis_overlay_params.overlayHeight,
		cudaMemcpyDeviceToDevice));

Still crashes.

I use the CUDA terminology: It uses the name “array” (cudaArray_t) for memory areas in texture objects.
That is not in a linear format but in some internal hardware specific cache efficient layout to speed up texture accesses.
Your OptiX buffer is of course also an array or vector in the C++ sense which means tightly packed linear data.

Copies to and from CUDA texture arrays from and to linear buffers will need to apply the correct layout changes.
That’s why there are specific copy commands which must know what the source and destination data layouts are.

The d3d texture is 2d and my optix buffer is a linear array.

Translated to CUDA terminology your D3D texture is accessed in CUDA via the cudaArray_t you got via cudaGraphicsSubResourceGetMappedArray and your buffer is just linear memory on the device.

So I want to copy from array to a 2d texture, where this is the other way around.

Then you need to copy from the linear buffer memory (the OptiX output buffer) to the CUDA mapped cudaArray_t of your D3D texture.

Now it’s crashing because you’re using &bufferPtr. That’s a pointer to the bufferPtr but bufferPtr itself is already the source data pointer which should be used inside cudaMemcpy2DToArray directly.
https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__MEMORY.html#group__CUDART__MEMORY_1g9509226164aaa58baf0c5b8ed165df58

Also aren’t all these crashes get caught in your CUDA_CHECK() macro containing additional information about the error reason (here most likely invalid access or misaligned access)?

Thank you! I think that did it.
Unity logging is a bit hit and miss. In this case I can work out where it happened, but a lot of times I don’t get any useful information in the crash logs. It is good training for my frustration tolerance.

C++ itself still isn’t natural to me, so I really appreciate spotting this one as well. Looks like there is still some problem with the actual texture being transferred, but that should be easy to fix. And then I can finally try to do the same thing with the mesh data, something I originally tried to do some 2 years ago :)

Also really appreciate the background info. Navigating Optix, Cuda, D3D and Unity at the same time can be a lot.

Thanks!