CUDA/OpenGL interoperability segfault using EGL OpenGL context (EGL_PLATFORM_DEVICE_EXT)

I’m on ubuntu linux, version 15.10, nvidia drivers 361.28, cuda toolkit 7.5, and have tested the following code on both a GeForce GT 750M & a GeForce GTX TITAN.

When creating an opengl context via EGL according to this blog post:

https://devblogs.nvidia.com/parallelforall/egl-eye-opengl-visualization-without-x-server/

every CUDA/OpenGL interoperability API function in the CUDA driver API segfaults. Here’s some example code exhibiting the issue (which works if I use a GLX-created opengl context):

#define EGL_EGLEXT_PROTOTYPES // for EGL extensions

#include <GL/gl.h>
#include <EGL/egl.h>
#include <EGL/eglext.h>

#include <cuda.h>
#include <cudaGL.h>

static const EGLint configAttribs = {EGL_SURFACE_TYPE,
EGL_PBUFFER_BIT,
EGL_BLUE_SIZE,
8,
EGL_GREEN_SIZE,
8,
EGL_RED_SIZE,
8,
EGL_ALPHA_SIZE,
8,
// EGL_DEPTH_SIZE,
// 8,
EGL_RENDERABLE_TYPE,
EGL_OPENGL_BIT,
EGL_NONE};

static const int pbufferWidth = 1024;
static const int pbufferHeight = 1024;

static const EGLint pbufferAttribs = {
EGL_WIDTH,
pbufferWidth,
EGL_HEIGHT,
pbufferHeight,
EGL_NONE,
};

static const EGLint contextAttribs = {
EGL_CONTEXT_MAJOR_VERSION,
4,
EGL_CONTEXT_MINOR_VERSION,
5,
EGL_CONTEXT_OPENGL_PROFILE_MASK,
EGL_CONTEXT_OPENGL_CORE_PROFILE_BIT,
EGL_NONE,
};

int main(int argc, char* argv) {
// 1. Initialize EGL

static const int MAX_DEVICES = 4;
EGLDeviceEXT eglDevs[MAX_DEVICES];
EGLint numDevices;

PFNEGLQUERYDEVICESEXTPROC eglQueryDevicesEXT = (PFNEGLQUERYDEVICESEXTPROC)eglGetProcAddress(“eglQueryDevicesEXT”);

eglQueryDevicesEXT(MAX_DEVICES, eglDevs, &numDevices);

PFNEGLGETPLATFORMDISPLAYEXTPROC eglGetPlatformDisplayEXT =
(PFNEGLGETPLATFORMDISPLAYEXTPROC)eglGetProcAddress(“eglGetPlatformDisplayEXT”);

EGLDisplay eglDpy = eglGetPlatformDisplayEXT(EGL_PLATFORM_DEVICE_EXT, eglDevs[0], 0);

PFNEGLQUERYDEVICEATTRIBEXTPROC eglQueryDeviceAttribEXT =
(PFNEGLQUERYDEVICEATTRIBEXTPROC)eglGetProcAddress(“eglQueryDeviceAttribEXT”);

int deviceId = -1;
eglQueryDeviceAttribEXT(eglDevs[0], EGL_CUDA_DEVICE_NV, reinterpret_cast<EGLAttrib*>(&deviceId));

EGLint major, minor;

eglInitialize(eglDpy, &major, &minor);

// 2. Select an appropriate configuration
EGLint numConfigs;
EGLConfig eglCfg;

eglChooseConfig(eglDpy, configAttribs, &eglCfg, 1, &numConfigs);

// 3. Create a surface
EGLSurface eglSurf = eglCreatePbufferSurface(eglDpy, eglCfg, pbufferAttribs);

// 4. Bind the API
eglBindAPI(EGL_OPENGL_API);

// 5. Create a context and make it current
EGLContext eglCtx = eglCreateContext(eglDpy, eglCfg, EGL_NO_CONTEXT, contextAttribs);

eglMakeCurrent(eglDpy, eglSurf, eglSurf, eglCtx);

cuInit(0);
unsigned int cudaDeviceCnt = 0;
unsigned int maxCudaDeviceCnt = 10;
CUdevice cudaDevices[10];
cuGLGetDevices(&cudaDeviceCnt, cudaDevices, maxCudaDeviceCnt, CU_GL_DEVICE_LIST_ALL);

// the above cuGLGetDevices() call segfaults every time. It works with an OpenGL context created via GLX

return 0;
}

Probably you can refer to a CUDA 7.5 Samples - “NVIDIA_CUDA-7.5_Samples/2_Graphics/simpleGLES”. It demonstrates data exchange between CUDA and OpenGL ES (aka Graphics interop). Please also note that this CUDA sample - “simpleGLES” itself is not supported on Linux-x86_64 platform.

I have the same problem. Using strace I discovered that the cuGLGetDevices call causes libGL.so to be dlopen’ed. I expected libEGL.so (or perhaps libOpenGL.so) to be opened. Perhaps this is a bug in the driver?

I tried to ‘trick’ the driver by create a link (named libGL.so.1) pointing to libEGL.so and setting LD_LIBRARY_PATH such that this link is picked up, instead of system library /usr/lib64/libGL.so.1. Now the call cuGLGetDevices does not crash. However, it returns CUDA_ERROR_OPERATING_SYSTEM (304).

BTW i’m using driver 361.28 on Linux x86_64 (CentOS 6.7 2.6.32-573.12.1.el6.x86_64)

We could repro this issue and have filed a bug report internally to track this issue.

@croot

Looks like the segfault has been fixed now in the latest v367.35 driver.

===

  • With v361.28(NVIDIA-Linux-x86_64-361.28.run):

$ ./a.out
Segmentation fault (core dumped)

  • After upgraded the latest v367.35(NVIDIA-Linux-x86_64-367.35.run):

./a.out echo $?
0