Convert ID3d11Resource to fp32 tensor in CUDA

Hello, im attempting to convert an ID3D11Texture2D (or ID3d11Resource for a more general perspective) to an fp32 tensor. i have gone through all the prior CUDA and d3d11 interop procedures such as registering the resource, mapping it to CUDA and retrieving the mapped array. However, i seem to be stuck in attempting to convert the ID3D11Texture to a fp32 tensor, i have first attempted to create a CUDA textureObject and then pass this through a conversion kernel to complete the frame to tensor conversion. However, the kernel doesn’t seem to convert it properly as in a test output (tensor to jpeg), all i see is a yellow canvas, whereas it should be a picture of an apple. I know i probably should have kept my kernel code to share but i already deleted in an attempt to rewrite it again, I do however have the other code related to my pipeline and ill be willing to share it if needed. For reference, my ID3d11Texture2D has a DXGI format of : DXGI_FORMAT_B8G8R8A8_UNORM. Also my code compiles (C++) without error, and no runtime errors occur. i also know the ID3D11Texture2D is valid and accurate since I’m rendering each one i receive from another pipeline in my code to a simple win32 to i created. Lastly, i have also heard that using cudaMemcpy2DFromArray() to create a linear buffer is another option from the textureObject() method. Any guidance or suggestions to my problem would be greatly appreciated, thank you.

What do you mean by fp32 tensor? A vector like float4? A format for the Tensor Cores? Just a multi-dimensional array?

I’m rendering each one i receive from another pipeline in my code to a simple win32

Is win32 a typo? Should it be fp32? Or is it some output format used with Windows?

Can you output a red, a green, a blue jpg image? Can you do .5 or 127 and it is half as bright?

Wouldn’t it be easier just to look at memory in the debugger or copy it to the host and into a file instead of into a jpg, if there is still a fundamental problem either with reading the texture or writing the jpg?

hello, thank you for your response to my question. To give a high-level overview my goal is to process ID3D11Texture2D frames through a TensorRT inference engine. As of right now i have successfully, created a C++ code to capture frames of a Windows32 window utilizing the WGC (windows graphics capture) API. My problem isn’t really related to rendering frames using the GPU but rather utilizing the power of the GPU to convert the ID3D11Texture2D i captured from the WGC API and run AI inference with TensorRT in milliseconds. I have come to learn that the proper procedure to bridge the two is to make the ID3D11Texture2D interopable with CUDA, register it, map it, and retrieve the map pointer, once this is completed the final step to make sure my ID3D11Texture2D can be interopable with TensorRT for AI image object detecting is to convert it into a tensor containing the pixel info. To accomplish this i created a CUDA kernel to bridge the ID3D11Texture2D to a tensor so i can then run AI inference “on the frame” with TensorRT. I simply created the test code to convert the tensor to a jpg check if the tensor represented the image i desired to capture. Also as a side note i mention fp32 (i guess just a fancy-er name for a float) since my TensorRT engine was created in fp32 mode so i believe the tensorRT engine expects an input tensor with this type of float. But above all my problem isnt with TensorRT inference but rather with the proper CUDA conversion to convert the ID3D11Texture2D to a float tensor so i can pass it to TensorRT. I hope this helps clarify a few things. Also please feel free to ask if you need to see any of my code to provide any further insight.

Tensor is a mathematical term. TensorRT is a Nvidia library. Different libraries have different high-level types expressing tensors. Those types in itself have an underlying element type, like float (float is the same as fp32, clearly specifying which bit size is meant).

If you say, you mean to convert into a tensor, it is not clear that you mean TensorRT tensor.

You are using a lot of libraries and high-level data formats, and there are a lot of interface points, where it could have gone wrong.

So best would be to test each part of the chain.

Can you read the ID3D11Texture2D successfully?

Can you write a TensorRT successfully from Cuda code?

Can you write a jpg file successfully?

And with successfully I do not mean without errors, but e.g. manually set pixels in Cuda with certain colors at certain locations and they appear there. Including not only full-intensity colors.

Writing a jpg output is rather strange. First it is complicated, e.g. needs a library, second it compresses the image, changing the data slightly. For running your AI inference it is slow, for debugging it is indirect. For screen captures jpg is not so suitable anyway, it better fits real life pictures/photos with gradients.

Normally for debugging one just writes the data of the texture into simple plain device memory (not even cudaMallocArray, just cudaMalloc), after the kernel cudaMemcpy into host memory, and inspects the data there. Then you can be sure that you do not introduce errors from additional libraries.

I also did dump part of the tensor output and i noticed most of the floats were near zero, which is odd since the frame I was capturing had a white background and a red apple in the center.

Just a side note the memory dump values I read from were taken after the kernel conversion. For this reason, that’s why I believe there is an error with my conversion kernel.

Belief should not come into it. Debugging is a fact-based process. As a first step, dump the raw data prior to conversion to establish whether it is correct.

Consider using special test images during debugging. When I worked in 3D graphics and had issues with texture functionality including conversions, I often used checkerboard patterns of pure {red | green | blue} and pure {white | black} or color bars (like in TV test images of old).

Ok, I will look try these recommendations. Thank you.

You could start finding out, if you can control the floats, which are not near zero. Are they showing the same output every time? Are they black, when you enter a black texture? Do they exhibit some spatial pattern?

Is it because of the read input or because of the output (e.g. wrong coordinate indexing dependent on threadIdx and blockIdx).

Can you create kernels with the same coordinate calculation setting all pixels?