cudaMalloc fails when using cudaGLSetGLDevice

Hello Everyone-

When starting a new cuda context with cudaGLSetGLDevice (instead of the usual cudaSetDevice), all cudaMalloc calls fail with “all CUDA-capable devices are busy or unavailable”. This happens in driver 270.41.19. When I try the same code on a box running driver 270.40 (beta) it segfaults while attempting the cudaMalloc. Below is an example that will recreate this issue.

main.cpp (368 Bytes)

I’m using cuda in Linux (Ubuntu 9.10, 64bit). The device information is:

Device 0: “GeForce 9800 GT”
CUDA Driver Version: 4.0
CUDA Runtime Version: 4.0
CUDA Capability Major revision number: 1
CUDA Capability Minor revision number: 1
Total amount of global memory: 1073020928 bytes
Number of multiprocessors: 14
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 8192
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 2147483647 bytes
Texture alignment: 256 bytes
Clock rate: 1.38 GHz
Concurrent copy and execution: Yes
Run time limit on kernels: Yes
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Default (multiple host threads can use this device simultaneously)
Concurrent kernel execution: No
Device has ECC support enabled: No

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 4.0, CUDA Runtime Version = 4.0, NumDevs = 1, Device = GeForce 9800 GT

The error you are getting is from cudaGLSetGLDevice() not cudaMalloc(). In order to use cudaGLSetDevice() you should first call cudaDeviceSynchronize(), and cudaDeviceReset().