multiplication memory offset


Assume I have a matrix A[10000],
I need to do multiplication such as B[tx]=A[tx]*A[tx-50]
Seems I am running very slow. Is that because I save the data in the gobal memory?
or tx-50 is not a good way to do it.


What hardware are you running on? On non-G200 hardware, that [tx-50] read is not going to be coalesced and therefore a factor of ~20 slower than it could be. You could bind A to a 1D texture and read with tex1Dfetch to get good performance.

I am running on 8800GT.

I should make my question clear.

I have two matrixs( A and B ) inside a loop. And A will update itself everytime.

For(int x=0; x<1000;x++)





I am trying to use a extra matrix C[tx]=A[tx-50]; and updata in the updata function also.

Then B[tx]=A[tx]*C[tx];

Will post if I have any improvment.