I feel like you’re all so brilliant, people literally don’t understand what “simple” actually means (based on your “simple” examples in your cuda repo). My complaint across really the entire C++ space is that you’re missing comfortable mid level apis. I’m going to focus in general on dx11 and Cuda.
Lets talk high/low apis and where I feel there’s a gap. Take for example dxgi, swap chains, and getting textures out of the GPU. Chuck Walbourn is the major contributor to DirectXTK. In this, he has some utilities for saving a dx11 textures as a wic. This is high level. Very opinionated and domain specific. Among other things he handwaves what turns out to be a not trivial thing for people not familiar with the space:
given an immediate device context and swapchain backbuffer.
It’s crazy how much complexity is behind such a simple phrase. Now I got there, but this is the key highlight of my frustration. I don’t need the WIC save, and I don’t want to know the color format of the dxgi screen capture (unfortunately I had to learn it). I want someone to make the swap chain for me, they can worry about the color format (which why there’s even so many options when really it’s essentially only one is beyond me) and tell what ever other method that needs to know that color format. I don’t want to know about gpu/cpu access of frames and the involved nuance. Maybe that sounds selfish, but you’re not going to re write an OS kernel every time you want to deploy an app? It’s building blocks and foundations so the next person can reasonably consume it with just the basics. This is a glaring example of missing the mid level apis.
That brings me to my current issue. I have a ID3D11Texture2D and I’d like to turn that in to an cv::cuda::GpuMat. I think 460 is boogered up right now when it comes to gpu mats, but I need just a simple example of ID3D11Texture2D → Cuda. I see this:
https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__D3D11.html
If I can get this, I’m confident I’ll figure out how to get it into the mat after the fact instead of a direct convert. Then questions pop on performance. If I’m not leaving the gpu, I don’t need any cpu access? I don’t need these massive examples. I need simple consumable examples (or actual mid level apis) with better docs that explain the ramification of choices I might make. Such as cpu vs gpu access flags on that dx11 frame etc.