Sequential loop inside kernel seems to be not working at all, or this is just a synchronization problem ?!

Invader0x7F · November 6, 2016, 8:14am

Hi. I’ve got a problem with the innermost sequential loop executed inside incKernel function. It’s intended to count a sum for each thread and print the output, but it seems to be not working. The only result I have from the code execution is that the value of sum_th[threadIdx.x] is not changed and is equal to zero for each thread. Can you point me at what is the problem with this loop, or this is just the lack of synchronizations. Thanks a lot. Waiting for your reply.

__device__ unsigned int sum_th[10] = { 0 };

__global__ void incKernel()
{
	__shared__ unsigned int counts[10];

	for (int i = 0; i < 10; i++)
		atomicAdd(&sum_th[threadIdx.x], counts[i]);

	__syncthreads();
	atomicCAS(&counts[threadIdx.x], 0, 1);

	printf("tid = %d count = %d\n", threadIdx.x, sum_th[threadIdx.x]);
}

int main()
{
	cudaError_t cudaStatus;
	cudaStatus = cudaSetDevice(0);
	if (cudaStatus != cudaSuccess) {
		fprintf(stderr, "cudaSetDevice failed!  Do you have a CUDA-capable GPU installed?");
		_getch();
		return 1;
	}

	incKernel << <1, 10 >> >();

	cudaStatus = cudaDeviceSynchronize();
	if (cudaStatus != cudaSuccess) {
		fprintf(stderr, "cudaDeviceSynchronize returned error code %d after launching addKernel!\n", cudaStatus);
		_getch();
		return 1;
	}

    cudaStatus = cudaDeviceReset();
    if (cudaStatus != cudaSuccess) {
        fprintf(stderr, "cudaDeviceReset failed!");
		_getch();
		return 1;
    }
	
	_getch();

    return 0;
}

Robert_Crovella · November 6, 2016, 12:48pm

You haven’t initialized your shared memory (counts) anywhere.

Your code doesn’t make sense to me.

Invader0x7F · November 6, 2016, 12:51pm

O’key. Thanks for your reply.

Topic		Replies	Views
loop execution inside kenel VS outside of it CUDA Programming and Performance	3	2737	January 15, 2008
Sequential execution from within the kernel fails CUDA Programming and Performance	0	562	March 23, 2012
not reading all values from array CUDA Programming and Performance	3	701	April 26, 2017
Missing Kernel executions CUDA Programming and Performance	2	921	June 27, 2012
Loop isn't executing inside the kernel CUDA on Windows Subsystem for Linux	0	456	September 9, 2022
threads in a loop threads go missing CUDA Programming and Performance	13	8468	September 9, 2008
Newbie Question: Threads What's going on here? CUDA Programming and Performance	5	2303	July 18, 2008
Synchronizing thrads inside kernels loop CUDA Programming and Performance	1	3918	July 19, 2007
How is this kernel locking with __syncthreads()? CUDA Programming and Performance	2	602	April 23, 2018
How to synchronize a Kernel with many for loops CUDA Programming and Performance	12	12143	November 28, 2011

Sequential loop inside kernel seems to be not working at all, or this is just a synchronization problem ?!

Related topics