CUDA samples yield code=3(cudaErrorInitializationError) when executed.

Hi, I am a newbie to CUDA. I have a MSI GeForce GT 710 which I believe is CUDA compatible (3.5 sm_code?)
I have installed the driver as per NVIDIA instructions disabling the nouveau driver.
I have also installed CUDA 8.0 and built gcc 5.4.0. The reason for this is my system has Fedora 25 with gcc 6.3.1 which gave an error at compilation saying gcc 6.3.1 was not compatible. But if I set up gcc 5.4.0 then the samples compile without error.

(21:47 admanero@ThermaltakeBox cdpSimplePrint) > make TARGET_ARCH=x86_64 dbg=1 SMS=“35”
/usr/local/cuda-8.0/bin/nvcc -ccbin g++ -I…/…/common/inc -m64 -g -G -dc -gencode arch=compute_35,code=sm_35 -gencode arch=compute_35,code=compute_35 -o cdpSimplePrint.o -c cdpSimplePrint.cu
/usr/local/cuda-8.0/bin/nvcc -ccbin g++ -m64 -g -G -gencode arch=compute_35,code=sm_35 -gencode arch=compute_35,code=compute_35 -o cdpSimplePrint cdpSimplePrint.o -lcudadevrt
mkdir -p …/…/bin/x86_64/linux/debug
cp cdpSimplePrint …/…/bin/x86_64/linux/debug

But 2 samples I have compiled (asyncAPI & cdpSimplePrint) successfully but both of them yield the same error on execution. And I am lost as to why this error :

  • have I missed something in the installation or if the GPU card is really CUDA compatible?
  • Fedora has to be only Fedora 23 with gcc 5.3.1 only for CUDA 8.0 to work? Would it work in Fedora 25 with gcc 5.4.0?

I have a Fedora 25 box with Intel Graphics and the MSI GeForce GT710 card. Any assistance would be much appreciated as I am newbie and I cannot find anything that would get me out of this problem in the Documentation nor in this DevTalk Forum. Thank you.

(21:40 admanero@ThermaltakeBox asyncAPI) > ./asyncAPI
[./asyncAPI] - Starting…
CUDA error at …/…/common/inc/helper_cuda.h:1133 code=3(cudaErrorInitializationError) “cudaGetDeviceCount(&device_count)”

(21:47 admanero@ThermaltakeBox cdpSimplePrint) > ./cdpSimplePrint
starting Simple Print (CUDA Dynamic Parallelism)
CUDA error at cdpSimplePrint.cu:134 code=3(cudaErrorInitializationError) “cudaGetDeviceCount(&device_count)”

(22:55 root@ThermaltakeBox cdpSimplePrint) > lspci -v
03:00.0 VGA compatible controller: NVIDIA Corporation GK208 [GeForce GT 710B] (rev a1) (prog-if 00 [VGA controller])
Subsystem: Micro-Star International Co., Ltd. [MSI] Device 8c93
Flags: bus master, fast devsel, latency 0, IRQ 134
Memory at de000000 (32-bit, non-prefetchable)
Memory at d0000000 (64-bit, prefetchable)
Memory at d8000000 (64-bit, prefetchable)
I/O ports at d000
[virtual] Expansion ROM at df000000 [disabled]
Capabilities: [60] Power Management version 3
Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Legacy Endpoint, MSI 00
Capabilities: [100] Virtual Channel
Capabilities: [128] Power Budgeting <?> Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
Kernel driver in use: nvidia
Kernel modules: nouveau, nvidia_drm, nvidia

ok, I have found the problem. I was executing on my box while connected through ssh -X from my laptop. When I disconnected and I went straight to the box and run it there. Then it worked. So I guess it is not possible to run CUDA on a GPU card if the X server is forwarding to another computer for remote access. I need to figure out how to set up X server to run on the motherboard Intel Graphics and not on the GeForce and have the GeForce only for CUDA running.