openGL CUDA 8.0 samples failing w/code=46(cudaErrorDevicesUnavailable) [no onboard graphics, linux]

I have installed CUDA 8.0 on my desktop running Ubuntu 16.04 with a GTX 1060 3GB. All of the samples work except for any that use openGL.

Because of issues I was getting with login loops, the way I installed CUDA was by first installing the NVIDIA 384.111 drivers via the runfile with --no-opengl-files set, then CUDA 8.0 also via the runfile without installing the drivers.

After installing I followed the guide and installed the extra libraries:

$sudo apt-get install g++ freeglut3-dev build-essential libx11-dev libxmu-dev libxi-dev libglu1-mesa libglu1-mesa-dev

Currently whenever I try to run samples that use openGL, I get: code=46(cudaErrorDevicesUnavailable).

Answers to this problem have all been related to the user having onboard graphics and the openGL context not being created on the GPU:
https://devtalk.nvidia.com/default/topic/935085/cuda-setup-and-installation/error-code-46-devices-unavailable-with-cuda-7-5-samples-on-windows-8-1/
https://devtalk.nvidia.com/default/topic/1024731/cuda-9-0-samples-error/
https://devtalk.nvidia.com/default/topic/1010103/runtime-cuda-error-sample-fluidsgl/

However, my display is being managed by the GPU since I don’t even have onboard graphics.

Here is the output of nvidia-smi:

Mon Jan 15 13:52:00 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.111                Driver Version: 384.111                   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 106...  Off  | 00000000:09:00.0  On |                  N/A |
|  0%   51C    P0    29W / 120W |    148MiB /  3010MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1153      G   /usr/lib/xorg/Xorg                           146MiB |
+-----------------------------------------------------------------------------+

And deviceQuery:

./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 1060 3GB"
  CUDA Driver Version / Runtime Version          9.0 / 8.0
  CUDA Capability Major/Minor version number:    6.1
  Total amount of global memory:                 3011 MBytes (3157000192 bytes)
  ( 9) Multiprocessors, (128) CUDA Cores/MP:     1152 CUDA Cores
  GPU Max Clock rate:                            1835 MHz (1.84 GHz)
  Memory Clock rate:                             4004 Mhz
  Memory Bus Width:                              192-bit
  L2 Cache Size:                                 1572864 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 9 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = GeForce GTX 1060 3GB
Result = PASS

Also,

$ lspci | grep VGA
09:00.0 VGA compatible controller: NVIDIA Corporation Device 1c02 (rev a1)

Thanks

EDIT:
Also it might be worth mentioning that I am able to run openGL code without CUDA.

Indeed, if you install with the --no-opengl-files switch, you won’t be able to run CUDA/OGL interop codes. For those codes, the OGL context must be resident on the NVIDIA GPU. When you don’t do that, the context is hosted on some other stack.

If you want to be able to run these CUDA/OGL interop codes, you’ll need to fix/redo your GPU driver install. I’ve never heard of the login-loop if there are no other graphics adaptors, but I suppose it could be possible if you’ve not properly/completely removed the nouveau drivers, and your GUI graphics stack is getting built on nouveau.