Looking for better way to copy scalar shared memory data to global memory

My kernel declares a scalar in shared memory that is incremented by threads. I need to copy this scalar to global memory. Here is my current attempt. My gripe is that it seems excessive to call __syncthreads() just because of a scalar data. Is there another/better way to do this?

__global__ void superDuperKernel( int* gMem )
{
	__shared__ int scalarData; // shared memory atomically incremented by threads
	// ...
	__syncthreads();
	
	if( threadIdx.x == 0 )
	{
		*gMem= scalarData;
	}
}