Cuda 7.5 Simulation Problem on CentOS 7.1

After installation of Cuda 7.5 on NVIDIA GeForce 840M, I can compile CUDA samples but not able to run simulation examples.

lspci results:

# lspci
00:00.0 Host bridge: Intel Corporation Haswell-ULT DRAM Controller (rev 0b)
00:02.0 VGA compatible controller: Intel Corporation Haswell-ULT Integrated Graphics Controller (rev 0b)
00:03.0 Audio device: Intel Corporation Haswell-ULT HD Audio Controller (rev 0b)
00:04.0 Signal processing controller: Intel Corporation Device 0a03 (rev 0b)
00:14.0 USB controller: Intel Corporation 8 Series USB xHCI HC (rev 04)
00:16.0 Communication controller: Intel Corporation 8 Series HECI #0 (rev 04)
00:1b.0 Audio device: Intel Corporation 8 Series HD Audio Controller (rev 04)
00:1c.0 PCI bridge: Intel Corporation 8 Series PCI Express Root Port 1 (rev e4)
00:1c.2 PCI bridge: Intel Corporation 8 Series PCI Express Root Port 3 (rev e4)
00:1c.3 PCI bridge: Intel Corporation 8 Series PCI Express Root Port 4 (rev e4)
00:1c.4 PCI bridge: Intel Corporation 8 Series PCI Express Root Port 5 (rev e4)
00:1d.0 USB controller: Intel Corporation 8 Series USB EHCI #1 (rev 04)
00:1f.0 ISA bridge: Intel Corporation 8 Series LPC Controller (rev 04)
00:1f.2 SATA controller: Intel Corporation 8 Series SATA Controller 1 [AHCI mode] (rev 04)
00:1f.3 SMBus: Intel Corporation 8 Series SMBus Controller (rev 04)
00:1f.6 Signal processing controller: Intel Corporation 8 Series Thermal (rev 04)
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 10)
03:00.0 Network controller: Broadcom Corporation BCM43142 802.11b/g/n (rev 01)
04:00.0 3D controller: NVIDIA Corporation GM108M [GeForce 840M] (rev a2)

deviceQuery results:

# ./deviceQuery 
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce 840M"
  CUDA Driver Version / Runtime Version          7.5 / 7.5
  CUDA Capability Major/Minor version number:    5.0
  Total amount of global memory:                 2048 MBytes (2147352576 bytes)
  ( 3) Multiprocessors, (128) CUDA Cores/MP:     384 CUDA Cores
  GPU Max Clock rate:                            1124 MHz (1.12 GHz)
  Memory Clock rate:                             900 Mhz
  Memory Bus Width:                              64-bit
  L2 Cache Size:                                 1048576 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 4 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 7.5, CUDA Runtime Version = 7.5, NumDevs = 1, Device0 = GeForce 840M
Result = PASS

oceanFFT error:

# ./oceanFFT 
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

[CUDA FFT Ocean Simulation]

Left mouse button          - rotate
Middle mouse button        - pan
Right mouse button         - zoom
'w' key                    - toggle wireframe
[CUDA FFT Ocean Simulation] 
CUDA error at oceanFFT.cpp:303 code=5(CUFFT_INTERNAL_ERROR) "cufftPlan2d(&fftPlan, meshSize, meshSize, CUFFT_C2C)"

nbody error:

# ./nbody 
Run "nbody -benchmark [-numbodies=<numBodies>]" to measure performance.
	-fullscreen       (run n-body simulation in fullscreen mode)
	-fp64             (use double precision floating point values for simulation)
	-hostmem          (stores simulation data in host memory)
	-benchmark        (run benchmark to measure performance) 
	-numbodies=<N>    (number of bodies (>= 1) to run in simulation) 
	-device=<d>       (where d=0,1,2.... for the CUDA device to use)
	-numdevices=<i>   (where i=(number of CUDA devices > 0) to use for simulation)
	-compare          (compares simulation results running once on the default GPU and once on the CPU)
	-cpu              (run n-body simulation on the CPU)
	-tipsy=<file.bin> (load a tipsy model file for simulation)

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
> Compute 5.0 CUDA device: [GeForce 840M]
CUDA error at bodysystemcuda_impl.h:160 code=46(cudaErrorDevicesUnavailable) "cudaEventCreate(&m_deviceData[0].event)"

at the meantime i checked /var/log/messages:

# tail -f /var/log/messages/
Oct 19 09:29:14 localhost kernel: ACPI Warning: \_SB_.PCI0.RP05.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20130517/nsarguments-95)
Oct 19 09:29:14 localhost kernel: ACPI Warning: \_SB_.PCI0.RP05.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20130517/nsarguments-95)
Oct 19 09:29:14 localhost kernel: ACPI Warning: \_SB_.PCI0.RP05.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20130517/nsarguments-95)
Oct 19 09:29:14 localhost kernel: ACPI Warning: \_SB_.PCI0.RP05.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20130517/nsarguments-95)
Oct 19 09:29:14 localhost kernel: ACPI Warning: \_SB_.PCI0.RP05.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20130517/nsarguments-95)
Oct 19 09:29:14 localhost kernel: ACPI Warning: \_SB_.PCI0.RP05.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20130517/nsarguments-95)
Oct 19 09:29:14 localhost kernel: ACPI Warning: \_SB_.PCI0.RP05.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20130517/nsarguments-95)
Oct 19 09:29:14 localhost kernel: ACPI Warning: \_SB_.PCI0.RP05.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20130517/nsarguments-95)
Oct 19 09:29:14 localhost kernel: ACPI Warning: \_SB_.PCI0.RP05.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20130517/nsarguments-95)
Oct 19 09:29:15 localhost kernel: ACPI Warning: \_SB_.PCI0.RP05.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20130517/nsarguments-95)

Is it a type of bug?

i take it you meet the min requirements in terms of cuda 7.5 for centos

and i take it you have exported the paths correctly

perhaps try running more samples, and running more elementary samples
both samples you have run i would not classify as elementary
this way you should be able to determine whether it is a case of all samples, or just some samples

the fact that you can run deviceQuery may suggest that it is merely a missing library, perhaps

Installed packages and libraries before installing NVIDIA driver and CUDA

yum install wget make gcc-c++ freeglut-devel libXi-devel libXmu-devel mesa-libGLU-devel
yum install mesa-libGLES.x86_64 mesa-libGL-devel.x86_64 mesa-libGLU-devel.x86_64 mesa-libGLw.x86_64 mesa-libGLw-devel.x86_64 libXi-devel.x86_64 freeglut-devel.x86_64 freeglut.x86_64
yum install kernel-devel-$(uname -r) kernel-headers-$(uname -r)

Blacklisting nouveau:

echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf 
echo "options nouveau modeset=0" >> /etc/modprobe.d/blacklist.conf
rpm -e xorg-x11-drivers
rpm -e xorg-x11-drv-nouveau
find / -name "*nouveau*"
rm -iv /usr/lib/modules/*/kernel/drivers/gpu/drm/nouveau -R
dracut --force

Thanks for your reply. Let me explain more deeper:

This PC is ASUS K555LN which has Intel i7-4510U CPU, 8GB RAM and NVIDIA Geforce 840M with Optimus Technology.

Exported paths are as follows:

export PATH=/usr/local/cuda-7.5/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-7.5/lib64:$LD_LIBRARY_PATH

I searched through various forum posts and blogs and noticed that installing CUDA in modern laptops is mostly tricky due to Optimus Tech. which brings hybrid GPUs (in this case Intel & NVIDIA) since rendering does not work on Nvidia GPU. I would like to give Bumblebee project on CentOS a go: http://bumblebee-project.org/

I disabled Optimus via my laptop’s BIOS, then removed removed the nouveau driver and manually installed the NVIDIA driver (via the run files provided on their site).

It’s more work to maintain, but it works well. You will lose a bit of battery life due to always using your discrete GPU, but I haven’t noticed it to be honest.