Convert ID3d11Resource to fp32 tensor in CUDA

learningtoflyy · May 23, 2025, 5:31pm

Hello, im attempting to convert an ID3D11Texture2D (or ID3d11Resource for a more general perspective) to an fp32 tensor. i have gone through all the prior CUDA and d3d11 interop procedures such as registering the resource, mapping it to CUDA and retrieving the mapped array. However, i seem to be stuck in attempting to convert the ID3D11Texture to a fp32 tensor, i have first attempted to create a CUDA textureObject and then pass this through a conversion kernel to complete the frame to tensor conversion. However, the kernel doesn’t seem to convert it properly as in a test output (tensor to jpeg), all i see is a yellow canvas, whereas it should be a picture of an apple. I know i probably should have kept my kernel code to share but i already deleted in an attempt to rewrite it again, I do however have the other code related to my pipeline and ill be willing to share it if needed. For reference, my ID3d11Texture2D has a DXGI format of : DXGI_FORMAT_B8G8R8A8_UNORM. Also my code compiles (C++) without error, and no runtime errors occur. i also know the ID3D11Texture2D is valid and accurate since I’m rendering each one i receive from another pipeline in my code to a simple win32 to i created. Lastly, i have also heard that using cudaMemcpy2DFromArray() to create a linear buffer is another option from the textureObject() method. Any guidance or suggestions to my problem would be greatly appreciated, thank you.

Curefab · May 23, 2025, 7:09pm

What do you mean by fp32 tensor? A vector like float4? A format for the Tensor Cores? Just a multi-dimensional array?

I’m rendering each one i receive from another pipeline in my code to a simple win32

Is win32 a typo? Should it be fp32? Or is it some output format used with Windows?

Can you output a red, a green, a blue jpg image? Can you do .5 or 127 and it is half as bright?

Wouldn’t it be easier just to look at memory in the debugger or copy it to the host and into a file instead of into a jpg, if there is still a fundamental problem either with reading the texture or writing the jpg?

learningtoflyy · May 23, 2025, 11:02pm

hello, thank you for your response to my question. To give a high-level overview my goal is to process ID3D11Texture2D frames through a TensorRT inference engine. As of right now i have successfully, created a C++ code to capture frames of a Windows32 window utilizing the WGC (windows graphics capture) API. My problem isn’t really related to rendering frames using the GPU but rather utilizing the power of the GPU to convert the ID3D11Texture2D i captured from the WGC API and run AI inference with TensorRT in milliseconds. I have come to learn that the proper procedure to bridge the two is to make the ID3D11Texture2D interopable with CUDA, register it, map it, and retrieve the map pointer, once this is completed the final step to make sure my ID3D11Texture2D can be interopable with TensorRT for AI image object detecting is to convert it into a tensor containing the pixel info. To accomplish this i created a CUDA kernel to bridge the ID3D11Texture2D to a tensor so i can then run AI inference “on the frame” with TensorRT. I simply created the test code to convert the tensor to a jpg check if the tensor represented the image i desired to capture. Also as a side note i mention fp32 (i guess just a fancy-er name for a float) since my TensorRT engine was created in fp32 mode so i believe the tensorRT engine expects an input tensor with this type of float. But above all my problem isnt with TensorRT inference but rather with the proper CUDA conversion to convert the ID3D11Texture2D to a float tensor so i can pass it to TensorRT. I hope this helps clarify a few things. Also please feel free to ask if you need to see any of my code to provide any further insight.

Curefab · May 24, 2025, 8:44am

Tensor is a mathematical term. TensorRT is a Nvidia library. Different libraries have different high-level types expressing tensors. Those types in itself have an underlying element type, like float (float is the same as fp32, clearly specifying which bit size is meant).

If you say, you mean to convert into a tensor, it is not clear that you mean TensorRT tensor.

You are using a lot of libraries and high-level data formats, and there are a lot of interface points, where it could have gone wrong.

So best would be to test each part of the chain.

Can you read the ID3D11Texture2D successfully?

Can you write a TensorRT successfully from Cuda code?

Can you write a jpg file successfully?

And with successfully I do not mean without errors, but e.g. manually set pixels in Cuda with certain colors at certain locations and they appear there. Including not only full-intensity colors.

Writing a jpg output is rather strange. First it is complicated, e.g. needs a library, second it compresses the image, changing the data slightly. For running your AI inference it is slow, for debugging it is indirect. For screen captures jpg is not so suitable anyway, it better fits real life pictures/photos with gradients.

Normally for debugging one just writes the data of the texture into simple plain device memory (not even cudaMallocArray, just cudaMalloc), after the kernel cudaMemcpy into host memory, and inspects the data there. Then you can be sure that you do not introduce errors from additional libraries.

learningtoflyy · May 26, 2025, 2:46pm

I also did dump part of the tensor output and i noticed most of the floats were near zero, which is odd since the frame I was capturing had a white background and a red apple in the center.

learningtoflyy · May 26, 2025, 2:49pm

Just a side note the memory dump values I read from were taken after the kernel conversion. For this reason, that’s why I believe there is an error with my conversion kernel.

njuffa · May 26, 2025, 4:16pm

Belief should not come into it. Debugging is a fact-based process. As a first step, dump the raw data prior to conversion to establish whether it is correct.

Consider using special test images during debugging. When I worked in 3D graphics and had issues with texture functionality including conversions, I often used checkerboard patterns of pure {red | green | blue} and pure {white | black} or color bars (like in TV test images of old).

learningtoflyy · May 26, 2025, 7:24pm

Ok, I will look try these recommendations. Thank you.

Curefab · May 26, 2025, 9:42pm

You could start finding out, if you can control the floats, which are not near zero. Are they showing the same output every time? Are they black, when you enter a black texture? Do they exhibit some spatial pattern?

Is it because of the read input or because of the output (e.g. wrong coordinate indexing dependent on threadIdx and blockIdx).

Can you create kernels with the same coordinate calculation setting all pixels?

Topic		Replies	Views
Using tex2D for unsigned short/char CUDA Programming and Performance	14	3692	November 15, 2017
Copy data from ID3D11Resource to CUDA buffer gives grey output. CUDA Programming and Performance	1	1144	May 29, 2017
TensorRT fails to build FasterRCNN GIE model with using INT8 TensorRT	28	9209	May 3, 2018
What about half-float? CUDA Programming and Performance	18	29392	October 26, 2017
How to use a Direct3D 2D texture as both a buffer for a render view as well as a CUDA resource? CUDA Programming and Performance	0	885	October 10, 2023
Using Textures CUDA Programming and Performance	10	21830	March 29, 2007
Reading R8G8B8A8 texture using tex2D() causes strange result. CUDA Programming and Performance	27	2885	April 28, 2018
Unity3D RenderTexture/Texture2D To OptixImage2D OptiX cuda , unity	9	3420	October 12, 2021
opencv imageData copy to cuda CUDA Programming and Performance	6	8595	May 22, 2012
Function cudaGraphicsD3D11RegisterResource returns INVALID_VALUE CUDA Programming and Performance cuda	9	285	July 3, 2024

Convert ID3d11Resource to fp32 tensor in CUDA

Related topics