CUDA Error: all CUDA-capable devices are busy or unavailable (err_no=46)

Need help on how to resolve this issue. I’ve read other forums and could not find a solution.

I’ve attached my specs.
Sanjib1.nfo (4.5 MB)

Thanks in advance.

The free mem of 0 MB is a solid indication that there is something going on with that GPU. Already in use, all memory allocated, or otherwise broken. You could try rebooting the server.

Rebooted the server. Still no good. Is there a way to identify what is using up all the memory? Task manager shows GPU is not being used all full capacity

check output from nvidia-smi

It shows that the memory is not being used.

What I noticed was when I disable 3 of the GPUs, it seems to only work when 1 is operating (could be any of the 4 GPUs). However the moment I enable 2 or more I get the error.

Any help is appreciated.

Thanks

well, device 2 shows memory in use. That may be due to the windows desktop. and I note your original report ran into trouble on device 2.

So I don’t know what app you are running, but if it is expecting all or most of the 24GB to be available on each GPU, that’s not going to work. One of your GPUs has memory in use, and it might be windows that is using it. You can see all the process listed below that are using GPU 2. They are using memory on GPU 2.

Comparing with my workstation running Windows 10, the amount of GPU memory used by Windows and Windows apps in the above output appears to be in typical range: In my case I see 643 MiB in use with one Edge browser instance open.

If you have source code for the application available, check how it allocates memory in a multi-GPU setting. Maybe it tries to allocate identical amounts on all GPUs? If you do not have the source code, check the documentation for any user-controllable allocation settings. If you cannot find anything there contact the vendor’s support.