I’m not aware of any concerns like that. The big thing (IMO) to be aware of on windows is being in TCC mode vs. WDDM mode (if you study your posting/output here, you will see the GPU is in TCC mode, so that is “good” - I don’t think an A100 could actually be in WDDM mode, but many other NVIDIA GPUs can be.) Other than that, if cudaMallocAsync is not supported, then that should mainly have the obvious implications for doing (or not) stream-oriented memory allocation.
Before cudaMallocAsync came along, I would have always suggested when I teach CUDA to get certain kinds of operations out of what I call performance loops - the areas of code where work is being issued to the GPU. One of those things to avoid is cudaMalloc. If you can use cudaMallocAsync (and do it well/correctly) then this concern pretty much goes away. Therefore if cudaMallocAsync is not available, then I would revert to my normal coding advice - if at all possible do cudaMalloc operations up front, before getting into the “performance loops” and as much as possible re-use allocations. It’s still good advice, in any CUDA programming setting, in my opinion.
Our GPUs are designed to work as well as possible in either Linux or Windows. A100 is not an exception. However the OS is not something that NVIDIA has full control of, so limitations presented by a particular OS are often things that cannot be worked around in CUDA. A big one is one I mentioned already - WDDM vs. TCC. For anyone who is doing significant GPU computing work on windows, I would always suggest TCC mode if possible, because WDDM creates a much more significant set of limitations. Those limitations are mentioned in other places, I don’t have a list to present here, but a big one is the limit on kernel execution duration that is present in WDDM and not TCC.
I consider the differences between TCC operation and linux operation to be pretty small, but they are obviously not zero - it seems we have a case right here, and I don’t know all the technical underpinnings of why or why not cudaMallocAsync might be available in one setting and not in another.