Unity3D RenderTexture/Texture2D To OptixImage2D

Hello guys, im trying to read Unity’s textures from native ptr inside OptiX, the goal is to get unity’s textures and cast them to OptixImage2D so in this way i can execute OptiX processes that requires OptixImage2D.
I have already tried for hours but without any success, it is what im trying to do now but still no success:

Unity:

//Test with RenderTexture (result: crash/code interrupt)

RenderTexture testRT = new RenderTexture(512, 512, 0, DefaultFormat.HDR);
testRT .enableRandomWrite = true;
testRT .Create();

SetUnityTexture(testRT.GetNativeTexturePtr());

//Test with Texture2D (result: crash/code interrupt)

Texture2D test2D = new Texture2D(512, 512, DefaultFormat.HDR, TextureCreationFlags.None);

SetUnityTexture(test2D.GetNativeTexturePtr());

Optix dll made for unity:

struct
{
	ID3D11Texture2D         *pTexture;
	ID3D11ShaderResourceView *pSRView;
	cudaGraphicsResource    *cudaResource;
	void                    *cudaLinearMemory;
	size_t                  pitch;
	int                     width;
	int                     height;
} g_texture_2d;

void SetUnityTexture(IntPtr ptr){
    ID3D11Texture2D* d3dtex = (ID3D11Texture2D*)ptr;
cudaGraphicsD3D11RegisterResource(&g_texture_2d.cudaResource, d3dtex, cudaGraphicsRegisterFlagsNone);
}

This code is not made for casting to OptixImage2D, its just a little and fast attempt to read the texture in the same way as it comes from Unity, but after the execution the process will crash without any error message.

Can you guys suggest me something? What im doing wrong?

This is more of a Unity and CUDA-D3D11 interop question. I don’t use either, so let’s focus on the requirements.

1.) You should check any CUDA API call for errors.
2.) cudaGraphicsD3D11RegisterResource() does not work on all formats, specifically not on any 3-component format. See the comments in cuda_d3d11_interop.h. Check if the register resource call succeeds.
3.) A registered resource cannot be changed in size. You would need to unregister and register the resource in all places where the application changes the texture size, if that can happen.
4.) The OptixImage2D requires a linear buffer. You register a texture resource which is not linear memory. You would need to use map to get some pointer as a CUDA texture array from which you could copy the data to a linear buffer. (At least that’s how it works with OpenGL.)
5.) The OptiX 7 denoiser requires half3/half4 or float3/float4 input formats. It doesn’t implement uchar3/uchar4 formats.
If the Unity texture is of a different format you would need to convert the data to a suitable format first.

There is a Vulkan example inside the nvpro-samples which uses the OptiX denoiser.
Maybe that helps to explain the necessary steps some more: https://github.com/nvpro-samples/vk_denoise

The first two hits when searching for “Unity cudaGraphicsD3D11RegisterResource” found these similar issues and Unity specific explanations:
https://issuetracker.unity3d.com/issues/cuda-graphics-interop-fails-in-the-native-plugin-when-using-rendertexture
https://forum.unity.com/threads/getnativetextureptr-call-behavior-differs-from-rendertexture-texture2d-how-come.196911/

Hello droettger,
thank you for your reply, it was really useful, after few hours i was able to register and read the texture in cuda in this way:

Unity:
same as before
Optix dll made for unity:

void SetUnityTexture(void* g_TextureHandle){
	cudaGraphicsResource_t resource;
	cudaGraphicsD3D11RegisterResource(&resource, (ID3D11Resource*)g_TextureHandle, cudaGraphicsRegisterFlagsNone);

	ID3D11Texture2D *d3dTex = (ID3D11Texture2D *)g_TextureHandle;
	D3D11_TEXTURE2D_DESC desc;
	d3dTex->GetDesc(&desc);
}

Debugging “desc” all values seems to be correct.
Now i need the last thing, casting/converting ID3D11Texture2D to OptixImage2D.
Can you helping me about this?
Thanks.

Update:
as you suggested, im using map to get linear data, my current code is:

    //Get device and device context
    	IUnityGraphicsD3D11* d3d = s_UnityInterfaces->Get<IUnityGraphicsD3D11>();
    	ID3D11Device* dev = d3d->GetDevice();
    	ID3D11DeviceContext *ctx = nullptr;
    	dev->GetImmediateContext(&ctx);

    //Register resource
    	cudaGraphicsResource_t resource;
    	cudaGraphicsD3D11RegisterResource(&resource, (ID3D11Resource*)g_TextureHandle, cudaGraphicsRegisterFlagsNone);

    //Get source texture descriptor
    	ID3D11Texture2D *d3dTex = (ID3D11Texture2D *)g_TextureHandle;
    	D3D11_TEXTURE2D_DESC desc;
    	d3dTex->GetDesc(&desc);

    //Create staging descriptor
    	D3D11_TEXTURE2D_DESC stagingdesc;
    	stagingdesc.Width = desc.Width;
    	stagingdesc.Height = desc.Height;
    	stagingdesc.MipLevels = desc.MipLevels;
    	stagingdesc.ArraySize = desc.ArraySize;
    	stagingdesc.Format = desc.Format;
    	stagingdesc.SampleDesc = desc.SampleDesc;
    	stagingdesc.Usage = D3D11_USAGE_STAGING;
    	stagingdesc.BindFlags = 0;
    	stagingdesc.CPUAccessFlags = D3D11_CPU_ACCESS_READ;
    	stagingdesc.MiscFlags = desc.MiscFlags;

    //Create staging texture
    	ID3D11Texture2D *stagingtexture;
    	HRESULT hr = dev->CreateTexture2D(&stagingdesc, NULL, &stagingtexture);

    //Copy from source texture to staging texture
    	ctx->CopyResource(stagingtexture, d3dTex);

    //Init map
    	D3D11_MAPPED_SUBRESOURCE mappedResource;
    	ZeroMemory(&mappedResource, sizeof(D3D11_MAPPED_SUBRESOURCE));

    	ctx->Map(stagingtexture, 0, D3D11_MAP_READ, 0, &mappedResource);

    	//...

    	ctx->Unmap(stagingtexture, 0);

It seems to works correctly, now i just need to know how to cast/convert to OptixImage2D
Someone can explain to me please?
Thanks.

Sorry, I don’t have any experience with D3D11.

With map I meant something along the line of this code, which is using CUDA-OpenGL interop and copies the output buffer into the texture image:
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/rtigo3/src/DeviceSingleGPU.cpp#L202

You would need the other direction in that cuMemcpy3D() call there:
https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g4b5238975579f002c0199a3800ca44df

I’d still recommend to put error checks around all CUDA calls and maybe around all other calls in your code.

EDIT: Ok, you edited it while I answered.
Not sure what the ctx->Map() does. If that is mapping to host memory, that won’t help much.
You need a CUDA device pointer to linear memory with the data.

The OptixImage2D structure just tells the OptiX denoiser invocation how to interpret that data.
This chapter in the OptiX 7.1 Programming Guide explains it:
https://raytracing-docs.nvidia.com/optix7/guide/index.html#ai_denoiser#nvidia-ai-denoiser

Here is my example which shows how to use it with half4 or float4 input:
https://github.com/NVIDIA/OptiX_Apps/tree/master/apps/intro_denoiser
Look at Application::initDenoiser(), void Application::setDenoiserImages() and bool Application::render().

What you do in the code above is called “staging”. You simple copy a sub-resource of the D3D11 resource from the texture into a staging texture. That one then can be accessed on the CPU through mapping: through “ctx->Map” (using ID3D11DeviceContext::Map (d3d11.h) - Win32 apps | Microsoft Docs)
and you can read the result on the host memory. You of course can do that, but that is not necessary and it may not be the fastest option (you then also would need to upload the converted data then again to the device when filling the CUDA buffer). You can instead read the D3D pixels from GPU memory and write them back into a CUDA buffer.

You can use a CUDA kernel for that. After you already successfully registered your D3D11 texture, use “cudaGraphicsMapResources”

You have 2 different cases, based on your input data:

  • for a 1D D3D11Buffer you simply would use “cudaGraphicsResourceGetMappedPointer” which will give you a device pointer into the “CUDA-resource-mapped” D3D (structured) buffer. NOTE: this resource mapping is NOT a host memory access, its then directly mapping the buffer on the device.
  • For a 2D D3Dtexture you need to use “cudaGraphicsSubResourceGetMappedArray”, then you need to copy the array content into another CUDA temp buffer of same type (same byte-width per pixel (and per color channel) as the original D3D11 texture); Ensure valid pitch/width settings. Use “cudaMemcpy2DFromArray” with copy-type “cudaMemcpyDeviceToDevice”
    The offset of that temp buffer (on the device) then is what you need.

That pointer then you can use as source offset within your CUDA kernel.
The destination pointer then can be your new CUDA buffer (which is the “data” field in the “OptixImage2D” structure).
Within the CUDA kernel you then can convert the color information.
After running the kernel use “cudaGraphicsUnmapResources”.
To save memory you can register/unregister the D3D11 resource on each frame; However, that may impact speed.

NOTE: ensure to correctly handle sRGB (Gamma Correction); based on your D3D11 texture format.
I see you use “DefaultFormat.HDR”; if that is a MultiSample-texture you also need to
use ResolveSubresource to get the desired pixel data.
I have no information about how Unity works, so please checkout what texture format is used in your case. Based on that you know whether you maybe need a sRGB conversion and/or resolve.

NOTE: Using a D3D11 buffer in the first place instead of a D3D11 texture2D would save you one buffer copy per frame, So if you for example already use some other HLSL shaders on the texture before sending it to the denoiser, in that HLSL shader simply write to a RWStructuredBuffer, that one then does not need the extra temp buffer copy, cause you can use “cudaGraphicsResourceGetMappedPointer” with it.

Both input types for the recource mapping (D3D11Buffer+D3D11 Textures) I tested in my application. But there’s also another option I can think of: If you cannot convert the D3D11 texture into a D3D11Buffer, then try to use D3D11 texture type DXGI_FORMAT_R32G32B32A32_FLOAT; This should be technically identical with float4 in CUDA; With that you don’t even need a CUDA kernel; instead you can use “cudaMemcpy2DFromArray” to copy the D3D11 float data directly into your final CUDA buffer.

1 Like

Hello m001,
Thank you so much for your detailed reply, following your explanation im finally able to do what i wanted to do,
its the final code:

//Register source texture in cuda
cudaGraphicsResource_t resource;
cudaGraphicsD3D11RegisterResource(&resource, (ID3D11Resource*)g_TextureHandle, cudaGraphicsRegisterFlagsNone);

//Get source texture descriptor
ID3D11Texture2D *sourceTex = (ID3D11Texture2D *)g_TextureHandle;
D3D11_TEXTURE2D_DESC sourceDesc;
sourceTex->GetDesc(&sourceDesc);

//Map resource
cudaGraphicsMapResources(1, &resource);

cudaArray* array_ptr = NULL;
cudaGraphicsSubResourceGetMappedArray(&array_ptr, resource, 0, 0);

float *dData;
cudaMalloc(&dData, sourceDesc.Width * sourceDesc.Height * 4 * sizeof(float));

cudaMemcpy2DFromArray(dData, sourceDesc.Width * 4 * sizeof(float), array_ptr, 0, 0, sourceDesc.Width * 4 * sizeof(float), sourceDesc.Height, cudaMemcpyDeviceToDevice);

cudaGraphicsUnmapResources(1, &resource);

I have linear data in dData now and its perfect for OptixImage2D, infact it works perfectly now but except for one thing:
the image is flipped vertically.
Probably the solution is so simple but on internet there is not so much info about this and cuda.

If that code is called more than once, move the cudaMalloc() somewhere outside to not leak the memory.

I have linear data in dData now and its perfect for OptixImage2D, infact it works perfectly now but except for one thing: the image is flipped vertically.
Probably the solution is so simple but on internet there is not so much info about this and cuda.

D3D normally has the texture origin at the top left, OpenGL has the texture origin at the bottom left, Vulkan specs used to describe it with top-left origin, but newer spec versions are wording it in an orientation agnostic way.

Means if you have a pointer to linear texel data it depends on your interpretation (display, storage) of the data what is upright.

The OptiX examples normally have the camera setup with the launch index [0, 0] at the lower left, so matching the OpenGL texture orientation with which the final image is displayed.

For the denoiser itself the orientation doesn’t matter if you’re denoising full images.
If you do partial tile denoising then you would need to know your coordinates’ orientation.

if you simply only need the final image after denoising (without tiling) back in a D3D11 resource, you can copy the output data after denoising back as it is. (that was what I assumed)
For that you then would use cudaMemcpy2DToArray.

If you instead download the texels from the GPU then you simply can interpret it as @droettger wrote.

    const int heightM1 = height - 1;
    float* src = ...  pointer to a CPU memory offset (downloaed from GPU)
    float* dest = ... pointer to destination image (CPU memory)
    const int src_pitch = width;
    const int dest_pitch = width;

    // NOTE "src" and "dest" need to be different buffers here (cannot be in-place when using this code)
    for (int j = 0; j < height; ++j)
    {
      for (int i = 0; i < width; ++i)
      {
          float4 pixel = src[j * src_pitch + i];
          pixel.w = 1.0f;   // set alpha to opaque (if desired)
          dest[(heightM1 - j) * dest_pitch + i] = pixel;
      }
    }

if the texture instead should remain on the GPU, you can do that converting in a CUDA kernel:

__global__ void cuda_kernel_tex_convert(unsigned char *srcPtr, int width, int height, int pitch_src, int pitch_dest,  unsigned char* buffer_DevPtr)
{
    int x = blockIdx.x*blockDim.x + threadIdx.x;
    int y = blockIdx.y*blockDim.y + threadIdx.y;
    if (x >= width || y >= height) return;

    const int heightM1 = height-1;
    float *src = (float *)(srcPtr + y*pitch_src) + 4*x;
    float *dest =  (float *)(buffer_DevPtr + (heightM1*y)*pitch_dest) + 4*x;

    dest[0] =  src[0];
    dest[1] =  src[1];
    dest[2] =  src[2];
    dest[3] =  1.0f; // alpha     (or use src[3] to copy also the alpha, if desired)    
}

extern "C"
int convert_texture(void *src, int width, int height, int pitch_src, int_pitch_dest, void* buffer_DevPtr)
{
    dim3 Db = dim3(BLOCK_DIM_X, BLOCK_DIM_Y);   // e.g. 256 threads
    dim3 Dg = dim3((unsigned int)((width+Db.x-1)/Db.x), (unsigned int)((height+Db.y-1)/Db.y));

    cuda_kernel_tex_convert<<<Dg,Db>>>((unsigned char *)src, (int)width, (int)height, pitch_src, pitch_dest, (unsigned char*)buffer_DevPtr);

  return  cudaGetLastError();
}


// launch the kernel:
const int pitchSRC = width * 4 * sizeof(float);    // assumed to be without padding
const int pitchDEST = width * 4 * sizeof(float);  // assumed to be without padding
convert_texture((void *)src, width, height, pitchSRC, pitchDEST, (void*)dest);

.

2 Likes