General frustrations with the C++ space and a lack of information around GpuMats. ID3D11Texture2D -> Cuda is the goal without going to the cpu

I feel like you’re all so brilliant, people literally don’t understand what “simple” actually means (based on your “simple” examples in your cuda repo). My complaint across really the entire C++ space is that you’re missing comfortable mid level apis. I’m going to focus in general on dx11 and Cuda.

Lets talk high/low apis and where I feel there’s a gap. Take for example dxgi, swap chains, and getting textures out of the GPU. Chuck Walbourn is the major contributor to DirectXTK. In this, he has some utilities for saving a dx11 textures as a wic. This is high level. Very opinionated and domain specific. Among other things he handwaves what turns out to be a not trivial thing for people not familiar with the space:

given an immediate device context and swapchain backbuffer.

It’s crazy how much complexity is behind such a simple phrase. Now I got there, but this is the key highlight of my frustration. I don’t need the WIC save, and I don’t want to know the color format of the dxgi screen capture (unfortunately I had to learn it). I want someone to make the swap chain for me, they can worry about the color format (which why there’s even so many options when really it’s essentially only one is beyond me) and tell what ever other method that needs to know that color format. I don’t want to know about gpu/cpu access of frames and the involved nuance. Maybe that sounds selfish, but you’re not going to re write an OS kernel every time you want to deploy an app? It’s building blocks and foundations so the next person can reasonably consume it with just the basics. This is a glaring example of missing the mid level apis.

That brings me to my current issue. I have a ID3D11Texture2D and I’d like to turn that in to an cv::cuda::GpuMat. I think 460 is boogered up right now when it comes to gpu mats, but I need just a simple example of ID3D11Texture2D → Cuda. I see this:
https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__D3D11.html

If I can get this, I’m confident I’ll figure out how to get it into the mat after the fact instead of a direct convert. Then questions pop on performance. If I’m not leaving the gpu, I don’t need any cpu access? I don’t need these massive examples. I need simple consumable examples (or actual mid level apis) with better docs that explain the ramification of choices I might make. Such as cpu vs gpu access flags on that dx11 frame etc.

I search for the answer over a month,and now I still don’t know how to deal with this ID3D11Texture2D to cv::cuda::GpuMat issue.

At the end of the day, I skipped the cuda cv mat, and just used a regular one. On a 12900k with a 2080ti converting a 640x640 I still achieve 160 fps and thats my engineering goal.

If you have questions or are still struggling to get anything working, let me know. I have a trail of stack over flow posts and my own code I can share.