A __syncthreads question


Note: for the sake of explanation, i’ll use matrix notations in the Kernel code as well.

Lets say I have an input square matrix A (mm), and I have two output square matrices to create out of it, X (mm) and Y (m*m).

What I want to do is, if the if-loop condition satisfies, then corresponding entries from A should be copied to X and then, during the next conditional loop, corresponding entries from “UPDATED” X to Y. But it doesn’t seem to copy the updated values…

__global__ kernel( the input matrix A and output matrices X and Y) {

int tx = threadIdx.x;

int ty = threadIdy.y;

if (tx <=ty)

    X[tx][ty] = A[tx][ty];


if (tx > ty)

    Y[tx][ty] = X[tx][ty];


The first if-loop should create an upper triangular matrix with entries of A and the second if-loop should create a lower triangular matrix with entries from UPDATED X.

What do you think is the problem behind this logic? I am more interested in the concept than the EXACT code.

thank you,


Did you just completely change the question and code more than two hours after you posted it, or am I having a short term memory problem? Pretty hard to answer a question when the target changes in the middle of answering it…

I am sorry I had to change it. I wanted it to be as simple as possible. However, the underlying question remains the same. When you have 3 matrices, A, X and Y and you want to store certain elements of A on to X and then after making sure the threads are all in sync try to store the updated entry of X on to Y using another condition, why isn’t it storing the updated entries of X, rather only the old entries??

Once again, sorry for the mess. I have the old post, if you still desire to answer that one.

thank you,


I think the problem is that inside the second if conditional exactly those elements of X get accessed that have not been written in the first if-conditional. So unless you provide some sensible input for X, you will just access uninitialized memory.

Hi, I just solved the issue. I am embarrassed, for the error was quite stupid.

in the call to kernel function, instead of kernel<<<blocks, threads>>> I had called as kernel<<<threads,blocks>>> which means the grid and threads/grid were entirely different!

After fixing it, the first part of my debugging is now fine. I’ll come back if I still have trouble with the other part…

Thank you once again,



If your code now runs fine, it means that the buggy code you posted above is not what you actually use.

Why do you post questions about code you are not using?
While posting questions in the forum is free, answering them still comes at a cost (in the form of brain time spent) to the people who can be bothered to do so. Which in turn makes it unlikely they will spend time on questions you may ask in the future.

Hello tera,

I was trying to fix my code for a long time and I was simplifying my code to find out where the problem was (with out realizing that it was a problem with the kernel call). And I came to this code where I was able to see that the updating did not happen as I intended. Probably you don’t see it with this code (because your X and Y matrices were not initialized with values). Try replacing, in the second loop, X[tx][ty] with X[0][0]. It won’t give the result as expected. This is what happened for me and so I was stuck at this point. I am sorry if my code does not make sense. But usually what I have seen (from other forums at least) is that a big code-post is almost never considered… So I wanted to simply. Sorry for the confusion. I’ll make sure my questions hereafter don’t pose confusion and double-work to the guys who try to help me.