why run on cpu?

why run on cpu and not on gpu when I execute a program implemented with cuda??
I’ve 9800GT and I can compile without any problem.
thank you

Your problem description is a bit too short to be able to give any useful response. Why do you think your program is running on CPU?

Because the GPU does not run an operating system, it cannot interface with the hard disks, it has no access to the computer’s main memory. Without a CPU controlling it, the GPU won’t compute anything.

The CPU remains the boss - and that won’t change anytime soon. Intel will take care of that ;)


Remember that the function that you want to run on the GPU must be declared using something like global, and called using <<< >>> type macros. Also the file must be compiled using nvcc.

I’ve used <<< >>>, global and nvcc for compiled.
It’s not normal when I execute a program the cpu uses 90-100% of its capacity.
I know that the GPU needs the CPU but 90-100% I think is so high.

All that time is spent waiting for the GPU in implicit synchronization. Search the forums for spin wait and you will find dozens of discussions on this topic (including workarounds), we don’t need to repeat them here.

hey guys, guess what!

CUcontext ctx;

CUdevice dev;

cuDeviceGet(&dev, 0);

cuCtxCreate(&ctx, CU_CTX_SCHED_YIELD, dev);


kernelCall<<<x, y>>>...

that works in 2.1 beta, unlike in 2.0. no more cudaEventQuery/yield loops.

So that didn’t work in CUDA 2.0, despite being in the reference manual for 2.0?

The functions prefixed with ‘cu’ are traditionally part of the driver API, and not the runtime API, correct? Will the above instructions work with the runtime API?

Correct, there was a driver bug relating to flags that got fixed (I don’t think it ever made it into the 177 or 178.xx drivers).

And yes, that will work with the runtime API. There’s a plan to add support for this to the runtime API, but calling cuCtxCreate and then using the runtime API does work.