We’re currently using CentOS server with 8 x 1080ti GPUs. We’re using docker containers to use GPUs.
During my work, I found that the server crashes sometimes with message below :
I believe that the problem is almost same as a problem linked below :
I can see that address is almost identical (2b28 vs 2b20) and there are other people who are facing the same problem.
Can anyone tell me what this issue is? At least, I’d like to know how I can work around this issue.