A100 can’t be used for computations

PavloTytarenko · January 16, 2023, 10:10pm

Hello! I have the following problem, An A100 card is installed in a PC with Windows Server 2022. It can be seen all is fine. But it refuses to do calculations. The error is like

My guess is this is because the card itself has Display Mode enabled.

Am I correct? And if I am then how to disable Display Mode?

Thank you!

PavloTytarenko · January 18, 2023, 8:37pm

Just an update. It appeared that matter is not in Display Mode at all but rather cudaMallocAsync is not working on A100. If to switch to cudaMalloc all is fine and running. So now another question appeared - why cudaMallocAsync is not working on A100?

Robert_Crovella · January 19, 2023, 4:53pm

A100 supports cudaMallocAsync. To determine if cudaMallocAsync is usable in a particular setting, please follow the instructions. The problem here may be that whatever that code is trying to do is not supported on windows.

PavloTytarenko · January 20, 2023, 2:18am

Many thanks for the reply! Just checked and cudaDevAttrMemoryPoolsSupported is 0. Which means that cudaMallocAsync is just not supported, correct?
I’m very curious why is it not supported? Does it mean that A100 is just uncapable of working in asynchronous regime in Windows? I tried to run a TensorRT inference simultaneously in many cuda streams. Performance improved but just a little bit. The same on say RTX 4090 produced much more substantial performance boost.
And if you don’t mind another question. What do you think, can A100 outperform RTX 4090? If used in MIG mode e.g.?

Robert_Crovella · January 20, 2023, 5:24am

correct, that is what 0 means

I believe it is because you are on windows

I don’t know. It may depend on the workload.

PavloTytarenko · January 20, 2023, 3:51pm

Thank you! And maybe 2 last questions on this:

If cudaMallocAsync is not supported then does it mean that other cuda operations are affected too if used with cuda stream as a parameter? Like worse performance or maybe cuda stream present as a parameter but ignored and so on?
Is A100 designed to be used for Linux and using it for Windows is kind of possible but not preferable and not optimal?

Robert_Crovella · January 20, 2023, 10:06pm

I’m not aware of any concerns like that. The big thing (IMO) to be aware of on windows is being in TCC mode vs. WDDM mode (if you study your posting/output here, you will see the GPU is in TCC mode, so that is “good” - I don’t think an A100 could actually be in WDDM mode, but many other NVIDIA GPUs can be.) Other than that, if cudaMallocAsync is not supported, then that should mainly have the obvious implications for doing (or not) stream-oriented memory allocation.

Before cudaMallocAsync came along, I would have always suggested when I teach CUDA to get certain kinds of operations out of what I call performance loops - the areas of code where work is being issued to the GPU. One of those things to avoid is cudaMalloc. If you can use cudaMallocAsync (and do it well/correctly) then this concern pretty much goes away. Therefore if cudaMallocAsync is not available, then I would revert to my normal coding advice - if at all possible do cudaMalloc operations up front, before getting into the “performance loops” and as much as possible re-use allocations. It’s still good advice, in any CUDA programming setting, in my opinion.

Our GPUs are designed to work as well as possible in either Linux or Windows. A100 is not an exception. However the OS is not something that NVIDIA has full control of, so limitations presented by a particular OS are often things that cannot be worked around in CUDA. A big one is one I mentioned already - WDDM vs. TCC. For anyone who is doing significant GPU computing work on windows, I would always suggest TCC mode if possible, because WDDM creates a much more significant set of limitations. Those limitations are mentioned in other places, I don’t have a list to present here, but a big one is the limit on kernel execution duration that is present in WDDM and not TCC.

I consider the differences between TCC operation and linux operation to be pretty small, but they are obviously not zero - it seems we have a case right here, and I don’t know all the technical underpinnings of why or why not cudaMallocAsync might be available in one setting and not in another.

PavloTytarenko · January 21, 2023, 11:11pm

Thank you!

system · February 4, 2023, 11:12pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
cudaErrorNotSupported during cudaMallocAsync on Windows 10 based Azure VM with Tesla T4 GPU CUDA Programming and Performance	2	978	April 26, 2023
Concurrent memcpy Performance RTX A2000 vs. Tesla T4 CUDA Programming and Performance cuda , c-plus-plus , gpu	2	1385	May 2, 2022
A100 doesn't support Memory Pool API? CUDA Programming and Performance	2	353	February 23, 2024
Need help with Cuda 9.0 /cuDNN 7.0.5 on TensorFlow 1.5 - CUDA_ERROR_UNKNOWN CUDA Setup and Installation	2	3342	March 1, 2018
Large allocations with cudaMallocManaged slow down synchronization CUDA Programming and Performance	11	1573	October 26, 2020
Odd crash after cudaStreamCreate() CUDA Programming and Performance	10	1567	August 21, 2016
cudaMalloc(Pitch) _significantly_ slower on windows with Geforce drivers > 350.12 CUDA Programming and Performance	10	2547	February 10, 2017
cudaMemcpy2DAsync not always fully synchronous CUDA Programming and Performance	11	1143	February 4, 2021
which version of cuda can work with RTX 2080 CUDA Setup and Installation	17	34250	May 13, 2021
Cuda and tensorflow CUDA Developer Tools	0	1118	September 18, 2020

A100 can’t be used for computations

Related topics