NVIDIA A100: OpenGL drawing issue

We would like to use our GPU nodes (one node has 8x NVIDA A100) for scientific visualization using OpenGL. We have problem with drawing scenes using OpenGL. The each rendered image contains wrong black / color pixels (for example in glxgears). We tried Xorg with or without VirtualGL and we have got same issue.

NVIDIA Driver Version: 470.57.02
OS: CentOS 7

Thank you for help/tips.

New observation: after restarting of the GPU node, OpenGL runs without problems. The problem with OpenGL occurs randomly. We are now trying to find out when this happens.

We faced the similar issue with A100. We are using nvdiffrast with PyTorch.

NVIDIA Driver Version: 450.119.04

The problem occurs randomly on different cards. Looks like only some cards are affected.
Moreover, we discovered that commenting glEnable(GL_DEPTH_TEST) inside nvdiffrast/common/rasterize.cpp fix the problem.

The following example reproduces the issue:

import torch
import nvdiffrast.torch as dr
import numpy as np
from matplotlib import pyplot as plt

def tensor(*args, **kwargs):
    return torch.tensor(*args, device='cuda', **kwargs)

depth = 5.1761e+00 / 6.0623e+00

pos = tensor([[[ 8.5783e-02,  9.9548e-02,  2.0576e+00,  3.0065e+00],
         [-1.7052e+00,  1.3828e-01,  2.0506e+00,  2.9996e+00],
         [ 6.6282e-02, -3.4532e+00,  2.0323e+00,  2.9817e+00],
         [ 8.4978e-02,  9.4055e-02,  3.0781e+00,  4.0065e+00],
         [-2.6015e+00,  1.5215e-01,  3.0675e+00,  3.9961e+00],
         [ 1.1423e-01,  5.4232e+00,  3.1161e+00,  4.0437e+00],
         [ 8.4174e-02,  8.8562e-02,  4.0986e+00,  5.0065e+00],
         [ 4.1140e+00,  1.4267e-03,  4.1145e+00,  5.0221e+00],
         [ 1.2805e-01,  8.0822e+00,  4.1556e+00,  5.0623e+00]]], dtype=torch.float32)
tri = torch.from_numpy(
    np.arange(len(pos[0])).reshape(-1, 3)

glctx = dr.RasterizeGLContext()
rast, _ = dr.rasterize(glctx, pos, tri, resolution=[1024, 1251])



Results are the following