X Error occurring when using CUDA-OpenGL-Interoperability

I have a Java application that uses lwjgl to render a 3D scene. I managed to use the JNI to call C++ code from the thread owning the OpenGL context of the 3D scene. I use glReadPixels to write pixel information to a pixel buffer object. Next, I would like to use CUDA Interoperability to obtain a pointer to the pixel buffer object. Unfortunately, calling cudaGraphicsGLRegisterBuffer sometimes causes the program to crash. More specifically, the program used to crash always when cudaGraphicsGLRegisterBuffer was called. I then changed some seemingly unrelated parts of the program and it would crash only sometimes. Now, I linked against the program against PyTorch, without calling any PyTorch functionality, and I always crashes again. In any case, the X Error shown is and has always been the same

X Error of failed request: GLXBadFBConfig
Major opcode of failed request: 152 (GLX)
Minor opcode of failed request: 21 (X_GLXGetFBConfigs)
Serial number of failed request: 201
Current serial number in output stream: 201
except for the number 201 not always being the same. Furthermore, the Java runtime environment let me know that the program crashes in glXGetFBConfigAttrib in libGLX_nvidia.so.0.

Finally, I used xtrace to get more information about the request made above by the program

002:<:00aa: 8: GLX-Request(152,21): glXGetFBConfigs opcode=0x98 opcode2=0x15 unparsed-data=0x00,0x00,0x00,0x00;
002:>:00aa:86296: Reply to glXGetFBConfigs: data1=0x01 data2=0x00 unparsed-data=0x07,0x01,0x00,0x00,0x29,0x00,0x00,0x00,0xe0,0x84,0x63,0x60,0xfd,0x7f,0x00,0x00,0x10,0xf1,0x43,0x1d,0x39,0x56,0x00,0x00,0x13,0x80,0x00,0x00,0xd5,0x00,0x00,0x00,0x02,0x00,0x00,0x00,0x18,0x00,0x00,0x00,0x03,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x05,0x00,0x00,0x00,0x01,0x00,0x00,0x00,0x06,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x07,0x00,0x00,0x00,0x04,0x00,0x00,0x00,0x08,0x00,0x00,0x00,0x08,0x00,0x00,0x00,0x09,0x00,0x00,0x00,0x08,0x00,0x00,0x00,0x0a,0x00,0x00,0x00,0x08,0x00,0x00,0x00,0x0b,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x0c,0x00,0x00,0x00,0x18,0x00,0x00,0x00,0x0d,0x00,0x00,0x00,0x08,0x00,0x00,0x00,0x0e,0x00,0x00,0x00,0x10,0x00,0x00,0x00,0x0f,0x00,0x00,0x00,0x10,0x00,0x00,0x00,0x10,0x00,0x00,0x00,0x10,0x00,0x00,0x00,0x11,0x00,0x00,0x00,0x10,0x00,0x00,0x00,0x11,0x80,0x00,0x00,0x01,0x00,0x00,0x00,0x10,0x80,0x00,0x00,0x07,0x00,0x00,0x00,0x12,0x80,0x00,0x00,0x01,0x00,0x00,0x00,0x22,0x00,0x00,0x00,0x02,0x80,0x00,0x00,0x20,0x00,0x00,0x00,0x00,0x80,0x00,0x00,0x23,0x00,0x00,0x00,0x00,0x80,0x00,0x00,0x25,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x26,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x27,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x28,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x24,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x16,0x80,0x00,0x00,0x00,0x80,0x00,0x00,0x17,0x80,0x00,0x00,0x00,0x80,0x00,0x00,0x18,0x80,0x00,0x00,0x00,0x00,0x00,0x40,0x0b,0x80,0x00,0x00,0x21,0x00,0x00,0x00,0xa0,0x86,0x01,0x00,0x00,0x00,0x00,0x00,0xa1,0x86,0x01,0x00,0x00,0x00,0x00,0x00,0xb0,0x20,0x00,0x00,0x00,0x00,0x00,0x00,0xd0,0x20,0x00,0x00,0x01,0x00,0x00,0x00,0xd1,0x20,0x00,0x00,0x00,0x00,0x00,0x00,0xd2,0x20,0x00,0x00,0x01,0x00,0x00,0x00,0xd3,0x20,0x00,0x00,0x07,0x00,0x00,0x00,0xd4,0x20,0x00,0x00,0x01,0x00,0x00,0x00,0xb2,0x20,0x00,0x00,0x01,0x00,0x00,0x00,0xb3,0x20,0x00,0x00,0x00,0x00,0x00,0x00;

I realize that problems of this sort are probably very rare in this forum. Unfortunately, I have very limited about GLX and X window system and I feel kinda lost how to pursue this issue now. Any tips on things that I should sanity check first are highly appreciated. I’d also highly appreciate general recommendations on how to approach this problem and information about the GLX request cited above.

From what I understand, the client (my program) made a request to the X window server, which was served successfully and the request as well as the response are documented above. However, the program crashed due to something being wrong with the response. I am wondering, how can I analyze the response to understand what caused the program to crash?
nvidia-bug-report.log.gz (326.2 KB)

Cuda/GL interop requires the Xserver to be running on the nvidia gpu. The glx errors looks like this is not the case, I guess you’re either connecting to a software Xserver from remote or it’s running on an integrated server graphics when logged in locally.
Please run nvidia-bug-report.sh as root and attach the resulting nvidia-bug-report.log.gz file to your post.

@generix thank you for your fast response! I updated my information and would appreciate you having another look at it. Thank you very much!

Seems my assumptions were wrong, you’re using a notebook with only a nvidia gpu. So I guess you’re also logging in locally, i.e. sitting in front of it?
Xserver and graphics stack are fine, so this doesn’t explain the issue you run into.
Only hint that I can give you is that cuda is defunct after a system suspend/resume cycle, either the nvidia-uvm module needs to be unloaded/reloaded or the system needs to be rebooted to make cuda work. I guess you already tried running your application after a fresh boot?