CUDA 11.6 OpenGL interoperability broken?

I have used CUDA on and off for 3-4 years now. After upgrading to the latest version, and using “make” to make the Samples, I find that a lot of them no longer work. For example, nbody crashes with the messages

Compute 6.1 CUDA device: [NVIDIA GeForce GTX 1060 with Max-Q Design]
CUDA error at bodysystemcuda_impl.h:191 code=999(cudaErrorUnknown) “cudaGraphicsGLRegisterBuffer(&m_pGRes[i], m_pbo[i], cudaGraphicsMapFlagsNone)”

and volumeRender crashes with

CUDA error at volumeRender.cpp:424 code=999(cudaErrorUnknown) “cudaGraphicsGLRegisterBuffer( &cuda_pbo_resource, pbo, cudaGraphicsMapFlagsWriteDiscard)”

My own previously-working code also crashes. In all cases, the errors appear to be associated with code of the sort below:

cudaGraphicsGLRegisterBuffer(&cudaPBORes, glPBO, cudaGraphicsMapFlagsWriteDiscard); 
   cudaGraphicsMapResources(1, &cudaPBORes, 0);

Can somebody help? This is a clean installation (I just installed everything from scratch yesterday):

OS: Ubuntu 22.04
nvidia-smi output: NVIDIA-SMI 515.43.04 Driver Version: 515.43.04 CUDA Version: 11.7
deviceQuery output:

Detected 1 CUDA Capable device(s)

Device 0: “NVIDIA GeForce GTX 1060 with Max-Q Design”
CUDA Driver Version / Runtime Version 11.7 / 11.7
CUDA Capability Major/Minor version number: 6.1
Total amount of global memory: 6078 MBytes (6373376000 bytes)
(010) Multiprocessors, (128) CUDA Cores/MP: 1280 CUDA Cores
GPU Max Clock rate: 1480 MHz (1.48 GHz)
Memory Clock rate: 4004 Mhz
Memory Bus Width: 192-bit
L2 Cache Size: 1572864 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total shared memory per multiprocessor: 98304 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Managed Memory: Yes
Device supports Compute Preemption: Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.7, CUDA Runtime Version = 11.7, NumDevs = 1
Result = PASS

OpenGL interop requires both a CUDA context and an OpenGL context. On laptops its important to make sure that both get instantiated on the same (NVIDIA) GPU, not the integrated graphics (Intel) GPU.

Make sure you run these programs with appropriate profiles and make sure your laptop GPU is enabled.

Updating a driver can affect this (laptop GPU operational behavior). I don’t have any other insight into what else may have changed/affected this.

Ack, it seems that you are right. Indeed, glGetString(GL_VENDOR), glGetString(GL_RENDERER), and
cout << glGetString(GL_VERSION) return

Intel
Mesa Intel(R) UHD Graphics 630 (CFL GT2)
4.6 (Compatibility Profile) Mesa 22.0.1

which suggests that glutInit (&argc, argv) set up an OpenGL context on the Intel GPU, not the NVIDIA GPU.
I’ll continue poking around but does anyone have a quick fix for this?

One possible method on either windows or linux (assuming you are running a linux GUI) is to use the optimus profile settings available in the control panel.

From the linux command line, one possible utility for this is prime-select (after installing nvidia-prime). This covers a variety of methods on linux.

Unfortunately, “sudo prime-select nvidia” doesn’t change anything.

The function cudaGLGetDevices(…) is supposed to “[get] the CUDA devices associated with the current OpenGL context.”

If any of the GPUs being used by the current OpenGL context are not CUDA capable" then the call is supposed return cudaErrorNoDevice.

However, cudaGLGetDevices() returns error 999 (cudaErrorUnknown).

Also, my GPU is old enough that it does not support Optimus.


unsigned int pCudaDeviceCount[16];
int iCudaDevices;
cudaError_t err = cudaGLGetDevices (pCudaDeviceCount, &iCudaDevices, 16, cudaGLDeviceListAll);
cerr << "iCudaDevices = " << iCudaDevices << endl;
for (int i=0; i<iCudaDevices; ++i);
cerr << err << endl;
cerr << cudaGetErrorString(err) << endl;

iCudaDevices = 32690
999
unknown error

I’m not sure what that means. I doubt that is true. Optimus technology was introduced well before the GTX 1060 was delivered.

I think its likely that your dGPU is not enabled for some reason. That could be the reason for cudaErrorUnknown. It may also be that your OpenGL stack is broken. If you installed CUDA (including the GPU driver) without installing the NVIDIA OpenGL driver components, that would do it.

I doubt I will be able to give you a recipe to sort this out, however there are many forum posts on these topics. The CUDA installation forum is where you will find questions like this.

Ok, problem solved! (Thanks to Robert Crovella!) In case anyone else comes across this:

PROBLEM: OpenGL context was being instantiated on integrated graphics (Intel) GPU, causing CUDA-GL interoperability to fail.

SOLUTION: As per NVIDIA Optimus - Debian Wiki, set a couple of environment variables before running the executable (in this case the executable is “./volumeRender”). This makes OpenGL run on the NVIDIA card, so that it can interface properly with CUDA:

__NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=nvidia ./volumeRender

This fix works on Ubuntu Linux 20.04 with NVIDIA driver 470.103.01 and CUDA version 11.4. (I expect it would also work with the most recent versions.)