GPU not actually calculating

DSM2012 · May 17, 2009, 9:08am

I’m using:
Nvidia Geforce 9500M GS (compute capability 1.1)
ubuntu 9.04 notebook remix
Cuda toolkit 2.2
nvidia driver 185.18.08 beta
newest version of eclipse
gcc 4.3

I’ve installed any and everything I’ve seen on the forums, changed the paths, and many other things. I can compile and run all examples from “make” in the appropriate directory, and can execute all of them with TEST PASSED at the end of the simulations.

My problem is when I actually look at the code and try to modify it. I’ve set up eclipse to do a random example - histogram64. It compiles and runs fine at first glance for 1 iteration. The GPU average time will be some amount of milliseconds and the CPU time will be a little longer. However, if I crank up the iterations as far as I can I get something like this in those areas:

Running GPU histogram (1000000000 iterations)…
histogram64GPU() time (average) : 0.000000 msec //1915341817155.926270 MB/sec
…
…
histogram64CPU() time : 12.749000 msec //748.038549 MB/sec

The GPU numbers look fishy to me, and it doesn’t seem to take any time to run at all. The only warning in eclipse is “Unresolved inclusion: <cutil_inline.h>”. I have a similar problem when trying to do this example
[url=“CUDA, Supercomputing for the Masses: Part 2 | Dr Dobb's”]http://www.ddj.com/hpc-high-performance-computing/207402986[/url]
except this can’t find inclusion <cuda.h>, it thinks global is a syntax error, and it thinks incrementArrayOnDevice is a syntax error. This example ends up “running” but has no output at all. Something is definitely messed up.

Does anyone have any suggestions for what to look for? I have no experience with linux and only program on Matlab, so be very clear if possible.

The remaing part is just me venting…
This is how far I’ve gotten after spending maybe a total of a week straight, 8 hours a day trying to set this CUDA stuff up. Now that my computer is set to dual boot ubuntu and vista I’ve gotten MUCH farther, as nothing worked in vista, but still. My advisor is considering buying a 1U Tesla unit or two as he is upgrading all of his compute nodes, but if I can’t show him any potential speed up that will go out the window. This will probably be the case considering I’ve spent so long just trying to get to a programmable state on this CUDA stuff that I could have had a fully parallel matlab implementation of my code debugged and running by now. I guess I just wish NVIDIA had this more efficiently implemented, with better documentation and programming tutorials. It seems like many of the posts here are on installation issues or upgrading software bugs rather than actual CUDA related topics!

avidday · May 17, 2009, 10:31am

It sounds like however you have setup you build system in Eclipse, it is broken. The symptoms you report trying to compile that example from DDJ is consistent with compiling CUDA code with the regular C compiler and not nvcc.

For what it is worth, I copied that code from DDJ into a text file, hacked together a 4 line Makefile from an existing one in the SDK, and it compiled and ran without error. The total time required was about 60 seconds.

It might be helpful if you build, run and post the output of the deviceQuery example in the SDK as a first step.

DSM2012 · May 17, 2009, 9:28pm

ok, so I set up eclipse the exact way that is said in this link:
[url=“Life of a Programmer Geek: Using Eclipse for CUDA Development”]http://lifeofaprogrammergeek.blogspot.com/...evelopment.html[/url]

And here is my output
CUDA Device Query (Runtime API) version (CUDART static linking)
There is 1 device supporting CUDA

Device 0: “GeForce 9500M GS”
CUDA Capability Major revision number: 1
CUDA Capability Minor revision number: 1
Total amount of global memory: 536150016 bytes
Number of multiprocessors: 4
Number of cores: 32
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 8192
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 262144 bytes
Texture alignment: 256 bytes
Clock rate: 0.95 GHz
Concurrent copy and execution: Yes
Run time limit on kernels: Yes
Integrated: No
Support host page-locked memory mapping: No
Compute mode: Default (multiple host threads can use this device simultaneously)

Test PASSED

Press ENTER to exit…

Using “make” in the SDK files gives the same solution. But I tried the ddj example and it compiles with no errors, but the second I run it the console has a bar that says
Again the only warnings in the eclipse editor are inclusions and syntax errors for global and such.

Thanks for helping though!!!

DSM2012 · May 17, 2009, 9:40pm

I should also say that I added printf test to show me when it enters and leaves main in the ddj example. This completes instantly, not after 60 seconds or so like you’ve reported.

avidday · May 18, 2009, 7:51am

OK, so you have a perfectly functional CUDA installation which obviously works correctly. So am I to understand that your problem is mostly that you want to work in Eclipse, but can’t get it to work?

BTW: The 60 second reference was the total amount of time it took to cut and paste the code from DDJ, write the makefile, compile the code and run it to confirm it as valid and functional.

Topic		Replies	Views
well how do I know if cuda runs on the gpu CUDA Programming and Performance	20	13410	July 9, 2008
Help me... Cuda program execution is slower than CPU...Did I miss any settings?? CUDA Programming and Performance	5	1192	September 24, 2015
GPU running time is not stable CUDA Programming and Performance	5	3056	April 24, 2010
CUDA slower than CPU? CUDA Programming and Performance	7	832	August 18, 2023
Trivial example code does not actually use the GPU device .. very strange. CUDA Setup and Installation	7	829	July 3, 2018
Confused about GPU vs CPU speed in multiplication CUDA Programming and Performance	8	6547	February 19, 2009
Cuda Emulation CUDA Programming and Performance	27	17247	April 25, 2009
CUDA version for GeForce FX 5200 CUDA Programming and Performance	3	16153	April 28, 2009
device speed vs. host speed Why is my device program so slow? CUDA Programming and Performance	8	7892	August 16, 2007
My GPU Became Slower... after 1 month of not testing cuda CUDA Programming and Performance	18	12162	August 23, 2010

GPU not actually calculating

Related topics