# Loop confusion

I am doing some operations on a matrix and based on whether we are dealing with the lower half or upper half, I apply a different formula. I am trying to move from one end of the matrix to the other starting at the end and moving toward the beginning. Unfortunately, my debug output shows that no matter what I do, I am moving from left to right on the matrix.

The code below just shows an example:

[codebox]

rowctr = 30;

int row = rowctr * blockDim.x + threadIdx.x;

for(rowctr=30; rowctr>=0;rowctr–)

{

row = rowctr * blockDim.x + threadIdx.x;

if(blockIdx.x == rowctr && threadIdx.x < rowctr)

{

}

}

[/codebox]

This prints out the correct matrix index for each column of the matrix - however, instead of starting at column 30, it is starting at column 1. I’ve tried using __synchthreads() just to see if it was a bug in the device emulator, but nothing seems to print column 30 first.

Any thoughts?

FYI - I’m calling my kernel by:

[codebox]

solver<<n,n>>(<device pointers); // n = 30

[/codebox]

I know this isn’t the most efficient use of CUDA, but I have data dependency issues, so I need to go through the matrix column by column.

Thanks for any help!

What you are seeing is the order the device emulator chooses to run your blocks in.

Why do you start different blocks and then do work in only on of them? Just loop over the work in one block only, and you will be able to determine the order of execution yourself.

I have a data dependency issue that requires me to operate on the matrix in a column by column fashion. I can’t just apply the equation to the entire matrix in one swoop. Essentially, I take a column of the matrix, multiply it by a vector, and use the results for the next column.

I’m not sure I know what you mean by just loop over the work in one block only…