CUDA vs. Cg

Suppose we implement matrix multuplication using CUDA and also using Cg, which one should run faster?

I am a total n00b in Cg. Does Cg also have any features like wherein we can access the shared on chip memory like we can do in CUDA?

No, Cg (or any other graphics APIs for that matter) does not expose shared memory, writing to multiple and arbitrary locations (scatter), or reading/writing memory directly (in graphics shaders you can only read from textures, which cannot be written in the same pass, so you end up “ping-ponging” between two buffers).

CUDA should be at least as fast as Cg, in many cases it is quite faster.


I don’t think there is a fair comparison between Cg and CUDA in speed; CUDA has some very useful features for general purpose calculation, while Cg has features for graphics that you won’t find in CUDA.

For matrix multiplication I think CUDA wins hands down.

In my experience (we implemented a finite element solver in both Cg and CUDA), if the size is small, Cg can even be a bit faster than CUDA. But from one stage, when the model becomes big enough, CUDA is up to 3 times faster. So I’d say there is no easy answer. For matrix multiplication, you can use shared memory with CUDA though, which should give you an additional advantage over Cg (which I can’t do for my solver).

If I have to choose, I’d pick CUDA. For a beginner, it’s easier to implement in CUDA than in Cg. And the next graphics cards might widen the difference as well.