issue writing shared memory location in loop

I’m new to CUDA and having an issue within a kernel I’m attempting to write. I’ve reduced the code to:

__shared__ int ILengths[LPB];

        ...

        int bfound=0;

	unsigned int jump;

	

	for (int i = 1; i < X; ++i) {

		

		bfound = 0;

		jump = ELEMENTS/4;

		while(!bfound){

			jump >> 1;

			if(jump==0){

                          bfound=1;

                          ILengths[i-1]++;

                        }

		

		}

	//ILengths[i-1]++;	

	__syncthreads();

	}

	}

When I execute the code I get a msg “This application has requested the Runtime to terminate in an unusual way…” However, this code doesn’t cause a problem:

__shared__ int ILengths[LPB];

        ...

        int bfound=0;

	unsigned int jump;

	

	for (int i = 1; i < X; ++i) {

		

		bfound = 0;

		jump = ELEMENTS/4;

		while(!bfound){

			jump >> 1;

			if(jump==0){

                          bfound=1;

                          //ILengths[i-1]++;

                        }

		

		}

	ILengths[i-1]++;	

	__syncthreads();

	}

	}

Can anyone shed some light?

Thanks.

I’m new to CUDA and having an issue within a kernel I’m attempting to write. I’ve reduced the code to:

__shared__ int ILengths[LPB];

        ...

        int bfound=0;

	unsigned int jump;

	

	for (int i = 1; i < X; ++i) {

		

		bfound = 0;

		jump = ELEMENTS/4;

		while(!bfound){

			jump >> 1;

			if(jump==0){

                          bfound=1;

                          ILengths[i-1]++;

                        }

		

		}

	//ILengths[i-1]++;	

	__syncthreads();

	}

	}

When I execute the code I get a msg “This application has requested the Runtime to terminate in an unusual way…” However, this code doesn’t cause a problem:

__shared__ int ILengths[LPB];

        ...

        int bfound=0;

	unsigned int jump;

	

	for (int i = 1; i < X; ++i) {

		

		bfound = 0;

		jump = ELEMENTS/4;

		while(!bfound){

			jump >> 1;

			if(jump==0){

                          bfound=1;

                          //ILengths[i-1]++;

                        }

		

		}

	ILengths[i-1]++;	

	__syncthreads();

	}

	}

Can anyone shed some light?

Thanks.

If this really is your kernel code then I’m surprised the compiler doesn’t warn about the uninitialized use of [font=“Courier New”]bfound[/font].

If this really is your kernel code then I’m surprised the compiler doesn’t warn about the uninitialized use of [font=“Courier New”]bfound[/font].

Sorry, I was pairing down the code at 2am and in the process accidentally caused bfound to be a 2nd issue. I’ve edited above to correct the uninitialized bfound, but the first issue remains. I might guess that perhaps it is related to parallel writes to a shared memory location. Isn’t there implicit synchronized access? I tried using atomicAdd(&ILengths[i-1], 1) but it made no difference.

Sorry, I was pairing down the code at 2am and in the process accidentally caused bfound to be a 2nd issue. I’ve edited above to correct the uninitialized bfound, but the first issue remains. I might guess that perhaps it is related to parallel writes to a shared memory location. Isn’t there implicit synchronized access? I tried using atomicAdd(&ILengths[i-1], 1) but it made no difference.

You are correct that the code will need to use atomicAdd to give correct results. However the kernel should not crash without it (unless the wrong results lead to a problem later in the kernel).

The current code contains an infinite loop unless [font=“Courier New”]ELEMENTS<4[/font]. I guess [font=“Courier New”]jump >> 1;[/font] should read [font=“Courier New”]jump >>= 1;[/font]
Can you give an actual code sample that reproduces the problem?

You are correct that the code will need to use atomicAdd to give correct results. However the kernel should not crash without it (unless the wrong results lead to a problem later in the kernel).

The current code contains an infinite loop unless [font=“Courier New”]ELEMENTS<4[/font]. I guess [font=“Courier New”]jump >> 1;[/font] should read [font=“Courier New”]jump >>= 1;[/font]
Can you give an actual code sample that reproduces the problem?

Thanks for pointing out the ‘>>=’! For reasons that are mostly irrelevant I was think of >> as >>=. I believe that was the only cause of the kernel crash. Also, I know now to use atomicAdd() as there is no implicit thread synchronization when writing shared memory, correct?

Thanks for pointing out the ‘>>=’! For reasons that are mostly irrelevant I was think of >> as >>=. I believe that was the only cause of the kernel crash. Also, I know now to use atomicAdd() as there is no implicit thread synchronization when writing shared memory, correct?