i have some questions about CUDA:
A few days ago i bought a GTX570, installed Microsoft Visual Studio 2012 and installed CUDA 5.5 (with the help of http://developer.download.nvidia.com/compute/cuda/5_5/rel/docs/CUDA_Getting_Started_Windows.pdf). I read, that Thrust is already included in new versions of CUDA, like 5.5. The installation worked fine and i didn’t see any problems. The computer passed all tests of the posted pdf. Only one thing was a bit strange: The description said “The .rules file is installed into $VisualStudioInstallDir\VC\VCProjectDefaults.” But i didn’t find a “.rules” file there. After the installation i tried to compile some of the example files but everytime i got errors like “kernel32.lib could not be opened” and indeed in the included/linked path “common/lib” of the CUDA Samples were some librarys but not kernel32.lib and the other librarys of the error message. So the first question is: Why does the CUDA installation and the installation of the CUDA examples not include the necessary library files? And how can i get them?
After that i tried to run a simple thrust programm, that worked fine on an other computer (but Linux and gcc compiler) and which produced a runtime error on my computer. According to the error message the reason of the runtime error is divison by zero. In the “launch_calculator.inl” file of CUDA the line “std::size_t num_blocks_per_multiprocessor = properties.maxThreadsPerMultiProcessor / num_threads_per_block;” divides by zero, so “num_threads_per_block is zero”, but why? I never changed a variable like this one! The function/line, that leads to this error is “thrust::sequence(d_vector.begin(), d_vector.end());”. So the error occurs while filling a simple vector with a sequence of integers…
After that i tried to compile a very simple example program of thrust (http://thrust.github.io/) with 4 debug messages:
std::cout << “1” << std::endl;
// generate 32M random numbers serially
std::generate(h_vec.begin(), h_vec.end(), rand);
std::cout << “2” << std::endl;
// transfer data to the device
thrust::device_vector d_vec = h_vec;
std::cout << “3” << std::endl;
// sort data on the device (846M keys per second on GeForce GTX 480)
std::cout << “4” << std::endl;
// transfer data back to host
thrust::copy(d_vec.begin(), d_vec.end(), h_vec.begin());
The compilation works fine but it takes 45 seconds! It is a very simple program, so why does it take so much time to compile it? The execution of the compiled program needs a lot of time as well. After appearance of debug message “2” it takes 40 seconds, before debug message “3” occurs; all other debug messages (1,2,4) appear immediately. I completely dont know, why the program and the compilation have such a bad performance.
I hope, that you can help me with some of the problems.