Hi!
I am working on a project where we use CUDA for GPU processing. Everything worked fine before christmas, but when returning to the project now, we have encountered a weird problem, without having modified any code at all. When trying to debug the error, we found that it was not only a problem in our project, but rather a problem with CUDA in general on our machines.
The problem is that some CUDA function calls never return, such as cudaMalloc and cudaMemGetInfo. It seems like this happens for all functions that accesses the device in any way, while calls such as setCudaDevice works fine.
We use CUDA 5.0, running on Ubuntu 11.10, 64 bit, on several machines. X on these machines are disabled (we use ssh), so nothing else should run on the CUDA cards but our program.
We have tried using different cards, such as GTX460, GTX480, GTX280 and NVS295, and the problem only occurs on the GPUs with compute 2.x, ie. the 460 and 480.
A basic C++/CUDA example is provided below. As you can see, we just try to allocate a small buffer of 1 byte on the device, but when running the code, it hangs at the cudaMalloc call, without returning any error codes, even when running in cuda-gdb.
#include <cuda.h>
#include <iostream>
using namespace std;
int main(int argc, char *argv[]) {
int devices = 0;
cudaGetDeviceCount(&devices);
if (devices < 1) {
cout << "No CUDA devices found!";
exit(0);
} else {
cudaSetDevice(0);
char * buf;
cerr << "A" << endl;
cudaMalloc((void**) &buf, 1); // Allocate 1 byte on the device
cerr << "B" << endl;
}
cudaDeviceReset();
}
It also hangs on “Allocating GPU memory…” in the SobolQRNG sample provided with the CUDA SDK. At the same time, the deviceQuery sample works without errors.
Do anyone know what might be causing this, or how we can fix this?
Thanks!