I am trying to develop a parallel Sat solver using CUDA for my engineering (undergraduate) final year project and use a device variable as a flag to see if a valid solution has been encountered. However, copying the device variable back into the host memory using cudaMemcpyFromSymbol() doesn’t seem to work. This happens in both the systems I use at the institution from where I’m pursuing my project. One has a QuadroFX 3700 (with CUDA toolkit 4.0 for Ubuntu 10.10 x64 installed) and the other has a QuadroNVS 300 (with CUDA toolkit 3.2 for Ubuntu 10.04 x64 installed). However, the same code runs as expected in my home desktop having a GeForce GTS250 (with CUDA toolkit 4.0 for Windows 7 x86).
Recently, I tried using cudaGetErrorString() and it always returned “all cuda-capable devices are busy or unavailable”. This happens for all the cudaMemcpy(), cudaMemset() and cudaMemcpyFromSymbol() calls.
It would be very helpful if someone can address this problem.
cudaMemcpyFromSymbol() works perfectly on Linux. I have used it on every cuda version and on every compute capability (1.0 → 2.0) without any problem.
First af all are the examples from SDK working on your linux box ?
Which with option did you compile on linux ? (make sure that you compile for the compute capability of the device you are using)
All the examples from the GPU Computing SDK are running fine.
I simply compiled my code with nvcc -w .cu.
As far as I remember the QuadroFX 3700 and the GTS250 have compute capability 1.1 and the QuadroNVS 300 has 1.2.
What is the -w option ?
Try to set properly the targeted architecture for nvcc: -arch=sm_11 or -arch=sm_12.
If you want more detailed control on nvcc code generation give a look at the nvcc documentation.
The -w option is just to supress the warnings during compilation.
I will try setting the target architecture as you said and see if that works out. Thanks.