Is it possible to estimate the performance ? 8500GT (current) -> 9800x (or GT200) = ?

Duplicating the question from programming forum branch …

My current graphics card is for experiments only: it is 8500 GT (16 processors, 128 bit memory bus, moreover - it is inserted into the second PCI-e slot which is slower e t c).

I’ve implemented the kernel for my needs that actually works (and I hope that it works not too slow - it does not diverge, it uses coalesced memory access, it works with shared memory e t c). As my kernel uses a lot of shared memory per thread (for stacks), I can run only about 32-42 threads per block. No matter how many test data sets I provide, the pure time of kernel run (without malloc/memcopy and other stuff like that) is always bigger than the time of the same calculations on CPU (including CPU malloc and other service stuff).

When workload is not huge (about 100000 test cases) the GPU kernel is about 2.5 times slower than CPU (Athlon X2 4800+, calculations done in single thread for correct timing). When it becomes really huge (about 10 millions test cases) the CPU is about 20% faster than GPU, but it is still faster. However, typical workload for my tasks is about 10000-100000 testcases.

What I’m trying to figure out is the estimation of my GPU kernel productivity on modern nvidia solutions. Specs of 9800-series cards are known, but I’m not sure if the speed of kernel run comparing to 8500 will grow linearly or may be better than linearly or how … The main problem (I gues) is in the number of threads per block that is low in my case, but it is limited by the amount of shared memory.

I have to decide if GPU’s may speed my tasks up … may be somebody faced the same choice some time, any opinion would be greatly appreciated.

Thanks in advance.

shared memory is not better on 9800GTX, so that will not help you, but it will have more multiprocessors than your current card. (8 times as many, so you will gain 8x the speed I think) Also memory bandwidth is much higher than on your card. So they will likely give quite a boost also (especially on 8800GTX/Ultra, I believe they have higher mem. bandwidth than 9800GTX.

keep your number of threads per block a multiple of 32.

Why do you need a stack? Are you doing raytracing maybe? I have implemented the stackless kd-tree traversal by using ropes and up to now it is working out quite nice.

Answered in the ‘programming’ thread.

Hint –
You could possibly estimate the sensitivity of your kernel to memory / processor speed by underclocking A) your shaders and B) your memory and seeing what happens.

Once you have a feel for that, you can estimate how linear or not adding a new card is.

Generally, my feeling is that if your kernel is running reasonably fast to begin with on a cheap card, you’ll see significant speedups on a faster card.

That said - the new cards are really rather cheap. I don’t know your economic situation, but I advise you to try to find an 8800 GT or GTS, if only for your development machine.

I’m waiting for the announcement of new cards to make a decision.