I have this problem as well. It happens in a large number of the samples. I system is a Thinkpad t430 running ubuntu 13.10.
I have run all the samples and collected the errors:
CUDA error at bicubicTexture_cuda.cu:37 code=46(cudaErrorDevicesUnavailable) "cudaMallocArray(&d_imageArray, &channelDesc, imageWidth, imageHeight)"
CUDA error at bilateral_kernel.cu:159 code=46(cudaErrorDevicesUnavailable) "cudaMemcpyToSymbol(cGaussian, fGaussian, sizeof(float)*(2*radius+1))"
CUDA error at bindlessTexture.cpp:236 code=46(cudaErrorDevicesUnavailable) "cudaGraphicsGLRegisterBuffer(&cuda_pbo_resource, pbo, cudaGraphicsMapFlagsWriteDiscard)"
CUDA error at boxFilter.cpp:286 code=46(cudaErrorDevicesUnavailable) "cudaMalloc((void **) &d_img, (width * height * sizeof(unsigned int)))"
Unified Memory not supported on this device
checkCudaErrors() Driver API error = 0999 "CUDA_ERROR_UNKNOWN" from file <videoDecodeGL.cpp>, line 423.
OpenGL device is Available
fluidsGL_kernels.cu(52) : getLastCudaError() CUDA error : cudaMalloc failed : (46) all CUDA-capable devices are busy or unavailable.
CUDA Capable Device 0, meets minimum required specs.
CUDA error at FunctionPointers_kernels.cu:309 code=46(cudaErrorDevicesUnavailable) "cudaMallocArray(&array, &desc, iw, ih)"
CUDA error at GrabcutMain.cpp:649 code=46(cudaErrorDevicesUnavailable) "cudaGraphicsGLRegisterBuffer(&pbo_resource, pbo, cudaGraphicsMapFlagsNone)"
CUDA error at imageDenoisingGL.cpp:556 code=46(cudaErrorDevicesUnavailable) "CUDA_MallocArray(&h_Src, imageW, imageH)"
CUDA error at Mandelbrot.cpp:971 code=46(cudaErrorDevicesUnavailable) "cudaGraphicsGLRegisterBuffer(&cuda_pbo_resource, gl_PBO, cudaGraphicsMapFlagsWriteDiscard)"
CUDA error at marchingCubes.cpp:475 code=46(cudaErrorDevicesUnavailable) "cudaMalloc((void **) &d_volume, size)"
CUDA error at bodysystemcuda_impl.h:156 code=46(cudaErrorDevicesUnavailable) "cudaEventCreate(&m_deviceData[0].event)"
CUDA error at particleSystem_cuda.cu:85 code=46(cudaErrorDevicesUnavailable) "cudaGraphicsGLRegisterBuffer(cuda_vbo_resource, vbo, cudaGraphicsMapFlagsNone)"
CUDA error at main.cpp:276 code=46(cudaErrorDevicesUnavailable) "cudaGraphicsGLRegisterBuffer(pbo_resource, *pbo, cudaGraphicsMapFlagsNone)"
CUDA error at recursiveGaussian.cpp:290 code=46(cudaErrorDevicesUnavailable) "cudaMalloc((void **) &d_img, size)"
CUDA error at main.cpp:487 code=46(cudaErrorDevicesUnavailable) "cudaGraphicsGLRegisterImage(&cuda_tex_result_resource, *tex_cudaResult, GL_TEXTURE_2D, cudaGraphicsMapFlagsWriteDiscard)"
CUDA error at simpleGL.cu:494 code=46(cudaErrorDevicesUnavailable) "cudaGraphicsGLRegisterBuffer(vbo_res, *vbo, vbo_res_flags)"
CUDA error at simpleTexture3D.cpp:248 code=46(cudaErrorDevicesUnavailable) "cudaGraphicsGLRegisterBuffer(&cuda_pbo_resource, pbo, cudaGraphicsMapFlagsWriteDiscard)"
SobelFilter_kernels.cu(223) : CUDA Runtime API error 46: all CUDA-capable devices are busy or unavailable.
CUDA error at volume.cpp:24 code=46(cudaErrorDevicesUnavailable) "cudaMalloc3DArray(&vol->content, &vol->channelDesc, dataSize, allowStore ? cudaArraySurfaceLoadStore : 0)"
CUDA error at volumeRender_kernel.cu:190 code=46(cudaErrorDevicesUnavailable) "cudaMalloc3DArray(&d_volumeArray, &channelDesc, volumeSize)"
It seems that that everything with a GUI (but not limited to) crashes. Every time with the “cudaErrorDevicesUnavailable”. The actual CUDA call differs although “cudaGraphicsGLRegisterBuffer” and “cudaMallocArray” reoccur quite a lot. Strangely i got the “nbody” sample works by passing the “hostmem” option.
Edit: I should just add that my laptop runs optimus. The installation guide here http://docs.nvidia.com/cuda/pdf/CUDA_Getting_Started_Linux.pdf mentioned that if you have both an integrated and discreet GPU you should use the --no-opengl-libs option. I assumed this meant optimus.
My device query for refrence:
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "NVS 5400M"
CUDA Driver Version / Runtime Version 6.0 / 6.0
CUDA Capability Major/Minor version number: 2.1
Total amount of global memory: 1024 MBytes (1073414144 bytes)
( 2) Multiprocessors, ( 48) CUDA Cores/MP: 96 CUDA Cores
GPU Clock rate: 950 MHz (0.95 GHz)
Memory Clock rate: 900 Mhz
Memory Bus Width: 128-bit
L2 Cache Size: 131072 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65535), 3D=(2048, 2048, 2048)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per multiprocessor: 1536
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (65535, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device PCI Bus ID / PCI location ID: 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.0, CUDA Runtime Version = 6.0, NumDevs = 1, Device0 = NVS 5400M
Result = PASS