Hello guys, im trying to read Unity’s textures from native ptr inside OptiX, the goal is to get unity’s textures and cast them to OptixImage2D so in this way i can execute OptiX processes that requires OptixImage2D.
I have already tried for hours but without any success, it is what im trying to do now but still no success:
Unity:
//Test with RenderTexture (result: crash/code interrupt)
RenderTexture testRT = new RenderTexture(512, 512, 0, DefaultFormat.HDR);
testRT .enableRandomWrite = true;
testRT .Create();
SetUnityTexture(testRT.GetNativeTexturePtr());
//Test with Texture2D (result: crash/code interrupt)
Texture2D test2D = new Texture2D(512, 512, DefaultFormat.HDR, TextureCreationFlags.None);
SetUnityTexture(test2D.GetNativeTexturePtr());
This code is not made for casting to OptixImage2D, its just a little and fast attempt to read the texture in the same way as it comes from Unity, but after the execution the process will crash without any error message.
Can you guys suggest me something? What im doing wrong?
This is more of a Unity and CUDA-D3D11 interop question. I don’t use either, so let’s focus on the requirements.
1.) You should check any CUDA API call for errors.
2.) cudaGraphicsD3D11RegisterResource() does not work on all formats, specifically not on any 3-component format. See the comments in cuda_d3d11_interop.h. Check if the register resource call succeeds.
3.) A registered resource cannot be changed in size. You would need to unregister and register the resource in all places where the application changes the texture size, if that can happen.
4.) The OptixImage2D requires a linear buffer. You register a texture resource which is not linear memory. You would need to use map to get some pointer as a CUDA texture array from which you could copy the data to a linear buffer. (At least that’s how it works with OpenGL.)
5.) The OptiX 7 denoiser requires half3/half4 or float3/float4 input formats. It doesn’t implement uchar3/uchar4 formats.
If the Unity texture is of a different format you would need to convert the data to a suitable format first.
There is a Vulkan example inside the nvpro-samples which uses the OptiX denoiser.
Maybe that helps to explain the necessary steps some more: https://github.com/nvpro-samples/vk_denoise
Debugging “desc” all values seems to be correct.
Now i need the last thing, casting/converting ID3D11Texture2D to OptixImage2D.
Can you helping me about this?
Thanks.
I’d still recommend to put error checks around all CUDA calls and maybe around all other calls in your code.
EDIT: Ok, you edited it while I answered.
Not sure what the ctx->Map() does. If that is mapping to host memory, that won’t help much.
You need a CUDA device pointer to linear memory with the data.
What you do in the code above is called “staging”. You simple copy a sub-resource of the D3D11 resource from the texture into a staging texture. That one then can be accessed on the CPU through mapping: through “ctx->Map” (using ID3D11DeviceContext::Map (d3d11.h) - Win32 apps | Microsoft Docs)
and you can read the result on the host memory. You of course can do that, but that is not necessary and it may not be the fastest option (you then also would need to upload the converted data then again to the device when filling the CUDA buffer). You can instead read the D3D pixels from GPU memory and write them back into a CUDA buffer.
You can use a CUDA kernel for that. After you already successfully registered your D3D11 texture, use “cudaGraphicsMapResources”
You have 2 different cases, based on your input data:
for a 1D D3D11Buffer you simply would use “cudaGraphicsResourceGetMappedPointer” which will give you a device pointer into the “CUDA-resource-mapped” D3D (structured) buffer. NOTE: this resource mapping is NOT a host memory access, its then directly mapping the buffer on the device.
For a 2D D3Dtexture you need to use “cudaGraphicsSubResourceGetMappedArray”, then you need to copy the array content into another CUDA temp buffer of same type (same byte-width per pixel (and per color channel) as the original D3D11 texture); Ensure valid pitch/width settings. Use “cudaMemcpy2DFromArray” with copy-type “cudaMemcpyDeviceToDevice”
The offset of that temp buffer (on the device) then is what you need.
That pointer then you can use as source offset within your CUDA kernel.
The destination pointer then can be your new CUDA buffer (which is the “data” field in the “OptixImage2D” structure).
Within the CUDA kernel you then can convert the color information.
After running the kernel use “cudaGraphicsUnmapResources”.
To save memory you can register/unregister the D3D11 resource on each frame; However, that may impact speed.
NOTE: ensure to correctly handle sRGB (Gamma Correction); based on your D3D11 texture format.
I see you use “DefaultFormat.HDR”; if that is a MultiSample-texture you also need to
use ResolveSubresource to get the desired pixel data.
I have no information about how Unity works, so please checkout what texture format is used in your case. Based on that you know whether you maybe need a sRGB conversion and/or resolve.
NOTE: Using a D3D11 buffer in the first place instead of a D3D11 texture2D would save you one buffer copy per frame, So if you for example already use some other HLSL shaders on the texture before sending it to the denoiser, in that HLSL shader simply write to a RWStructuredBuffer, that one then does not need the extra temp buffer copy, cause you can use “cudaGraphicsResourceGetMappedPointer” with it.
Both input types for the recource mapping (D3D11Buffer+D3D11 Textures) I tested in my application. But there’s also another option I can think of: If you cannot convert the D3D11 texture into a D3D11Buffer, then try to use D3D11 texture type DXGI_FORMAT_R32G32B32A32_FLOAT; This should be technically identical with float4 in CUDA; With that you don’t even need a CUDA kernel; instead you can use “cudaMemcpy2DFromArray” to copy the D3D11 float data directly into your final CUDA buffer.
I have linear data in dData now and its perfect for OptixImage2D, infact it works perfectly now but except for one thing:
the image is flipped vertically.
Probably the solution is so simple but on internet there is not so much info about this and cuda.
If that code is called more than once, move the cudaMalloc() somewhere outside to not leak the memory.
I have linear data in dData now and its perfect for OptixImage2D, infact it works perfectly now but except for one thing: the image is flipped vertically.
Probably the solution is so simple but on internet there is not so much info about this and cuda.
D3D normally has the texture origin at the top left, OpenGL has the texture origin at the bottom left, Vulkan specs used to describe it with top-left origin, but newer spec versions are wording it in an orientation agnostic way.
Means if you have a pointer to linear texel data it depends on your interpretation (display, storage) of the data what is upright.
The OptiX examples normally have the camera setup with the launch index [0, 0] at the lower left, so matching the OpenGL texture orientation with which the final image is displayed.
For the denoiser itself the orientation doesn’t matter if you’re denoising full images.
If you do partial tile denoising then you would need to know your coordinates’ orientation.
if you simply only need the final image after denoising (without tiling) back in a D3D11 resource, you can copy the output data after denoising back as it is. (that was what I assumed)
For that you then would use cudaMemcpy2DToArray.
If you instead download the texels from the GPU then you simply can interpret it as @droettger wrote.
const int heightM1 = height - 1;
float* src = ... pointer to a CPU memory offset (downloaed from GPU)
float* dest = ... pointer to destination image (CPU memory)
const int src_pitch = width;
const int dest_pitch = width;
// NOTE "src" and "dest" need to be different buffers here (cannot be in-place when using this code)
for (int j = 0; j < height; ++j)
{
for (int i = 0; i < width; ++i)
{
float4 pixel = src[j * src_pitch + i];
pixel.w = 1.0f; // set alpha to opaque (if desired)
dest[(heightM1 - j) * dest_pitch + i] = pixel;
}
}
if the texture instead should remain on the GPU, you can do that converting in a CUDA kernel:
__global__ void cuda_kernel_tex_convert(unsigned char *srcPtr, int width, int height, int pitch_src, int pitch_dest, unsigned char* buffer_DevPtr)
{
int x = blockIdx.x*blockDim.x + threadIdx.x;
int y = blockIdx.y*blockDim.y + threadIdx.y;
if (x >= width || y >= height) return;
const int heightM1 = height-1;
float *src = (float *)(srcPtr + y*pitch_src) + 4*x;
float *dest = (float *)(buffer_DevPtr + (heightM1*y)*pitch_dest) + 4*x;
dest[0] = src[0];
dest[1] = src[1];
dest[2] = src[2];
dest[3] = 1.0f; // alpha (or use src[3] to copy also the alpha, if desired)
}
extern "C"
int convert_texture(void *src, int width, int height, int pitch_src, int_pitch_dest, void* buffer_DevPtr)
{
dim3 Db = dim3(BLOCK_DIM_X, BLOCK_DIM_Y); // e.g. 256 threads
dim3 Dg = dim3((unsigned int)((width+Db.x-1)/Db.x), (unsigned int)((height+Db.y-1)/Db.y));
cuda_kernel_tex_convert<<<Dg,Db>>>((unsigned char *)src, (int)width, (int)height, pitch_src, pitch_dest, (unsigned char*)buffer_DevPtr);
return cudaGetLastError();
}
// launch the kernel:
const int pitchSRC = width * 4 * sizeof(float); // assumed to be without padding
const int pitchDEST = width * 4 * sizeof(float); // assumed to be without padding
convert_texture((void *)src, width, height, pitchSRC, pitchDEST, (void*)dest);