Problem in parellel sum

fcckingflier · November 13, 2020, 1:56pm

I write a function to generate a 6x6 matrix in every thread and I want to sum up them in the function, then I find that the result will be influenced by the amount of syncthread(),here is my code

 __shared__ float dim_shared1[256];
        int tx = threadIdx.x , ty = threadIdx.y  ,tid = tx*16+ty;
        
        int x = threadIdx.x + blockIdx.x * blockDim.x;
    	int y = threadIdx.y + blockIdx.y * blockDim.y;
        int u = x * downsample;
        int v = y * downsample;

        bool is_valid = true;
        Matrix6f new_H;
        Matrix6x1f new_b;
        new_H.setZeros();
        new_b.setZeros();
        is_valid = computeAHSemanticTrack(u,v,voxelData,voxelIndex,huber_constant,label,depth,transform,sensor_rows,
            sensor_cols, downsample, sceneIntrinsics, sceneParams, gaussian_weight, kernel_size,
            residuals_threshold,new_H,new_b);        // function to generate matrix
        if(!is_valid){

            new_H.setZeros();
            new_b.setZeros();
        }
        __syncthreads();

        for(int ii = 0 ; ii < 6 ; ii ++) for(int jj = 0 ; jj < 6 ; jj ++){
            
                dim_shared1[tid] = new_H(ii,jj);
                __syncthreads(); 
                for(int s = 128;s>0;s>>=1){
                    if(tid < s){
                        dim_shared1[tid] += dim_shared1[tid+s];
                    }
                    __syncthreads(); 
                    // __syncthreads(); 
                    // __syncthreads();                 
                    // __syncthreads(); 
                    // __syncthreads(); 
                    // __syncthreads();                 
                    // __syncthreads(); 
                    // __syncthreads(); 
                    // __syncthreads(); 
                    // __syncthreads(); here if all the __syncthreads() run,the answer is right
                    
                }
                __syncthreads(); 
                // __syncthreads(); 
                // __syncthreads(); 
                if(tid == 0){
                    atomicAdd(&acc_H->at(ii,jj),dim_shared1[0]);
                }
                __syncthreads(); 
        }

if there is only one __syncthreads() the result will be totally wrong.
By the way , If I use atomicAdd to sum up all matrix ,the answer is right, so matrix generation function is right.
What’s the problem? My GPU is Titan Xp and CUDA version is 10.0.

Topic		Replies	Views
__syncthreads screwes calculation CUDA Programming and Performance	2	3431	November 22, 2007
problem with __syncthreads(); CUDA Programming and Performance	1	1702	December 15, 2011
Why this kernel "needs" a __syncthreads() statement to work? CUDA Programming and Performance	3	911	May 7, 2017
Syncthreads and Stalling Kernels CUDA Programming and Performance	16	4176	August 26, 2010
Updating Same Global Memory Concurrently While doing sparse matrix multiplication, write error occur CUDA Programming and Performance	3	5467	November 27, 2010
multiple threads writing value to a same variable CUDA Programming and Performance	19	6961	March 20, 2012
Simple Thread Problem CUDA Programming and Performance	1	4087	September 24, 2009
Parallel Programming Problem Sum values into one array-cell CUDA Programming and Performance	2	1275	January 27, 2010
cuda syncthreads fail CUDA Programming and Performance	7	3891	February 22, 2013
prefix_sum, can not syncthreads CUDA Programming and Performance	1	490	February 22, 2017

Problem in parellel sum

Related topics