I’m using:
Nvidia Geforce 9500M GS (compute capability 1.1)
ubuntu 9.04 notebook remix
Cuda toolkit 2.2
nvidia driver 185.18.08 beta
newest version of eclipse
gcc 4.3
I’ve installed any and everything I’ve seen on the forums, changed the paths, and many other things. I can compile and run all examples from “make” in the appropriate directory, and can execute all of them with TEST PASSED at the end of the simulations.
My problem is when I actually look at the code and try to modify it. I’ve set up eclipse to do a random example - histogram64. It compiles and runs fine at first glance for 1 iteration. The GPU average time will be some amount of milliseconds and the CPU time will be a little longer. However, if I crank up the iterations as far as I can I get something like this in those areas:
Running GPU histogram (1000000000 iterations)…
histogram64GPU() time (average) : 0.000000 msec //1915341817155.926270 MB/sec
…
…
histogram64CPU() time : 12.749000 msec //748.038549 MB/sec
The GPU numbers look fishy to me, and it doesn’t seem to take any time to run at all. The only warning in eclipse is “Unresolved inclusion: <cutil_inline.h>”. I have a similar problem when trying to do this example
[url=“CUDA, Supercomputing for the Masses: Part 2 | Dr Dobb's”]http://www.ddj.com/hpc-high-performance-computing/207402986[/url]
except this can’t find inclusion <cuda.h>, it thinks global is a syntax error, and it thinks incrementArrayOnDevice is a syntax error. This example ends up “running” but has no output at all. Something is definitely messed up.
Does anyone have any suggestions for what to look for? I have no experience with linux and only program on Matlab, so be very clear if possible.
The remaing part is just me venting…
This is how far I’ve gotten after spending maybe a total of a week straight, 8 hours a day trying to set this CUDA stuff up. Now that my computer is set to dual boot ubuntu and vista I’ve gotten MUCH farther, as nothing worked in vista, but still. My advisor is considering buying a 1U Tesla unit or two as he is upgrading all of his compute nodes, but if I can’t show him any potential speed up that will go out the window. This will probably be the case considering I’ve spent so long just trying to get to a programmable state on this CUDA stuff that I could have had a fully parallel matlab implementation of my code debugged and running by now. I guess I just wish NVIDIA had this more efficiently implemented, with better documentation and programming tutorials. It seems like many of the posts here are on installation issues or upgrading software bugs rather than actual CUDA related topics!