Column wise reduction of column major matrix

shunyo · February 23, 2016, 9:22pm

I am trying to perform a column wise reduction of a column major matrix using CUDA. I tried to change the reduction example to calculate the sum for each column. The kernel times out. I do not know what is causing the problem or what I can do to redress it.

template <typename T, unsigned int blockSize>
        __global__ void matrix_col_reduction(T * g_idata, T * g_odata, int ncols, int N)
        {
            // get the thread index
            unsigned int tid = threadIdx.x;
            // get the current element index for the thread
            unsigned int gid = blockIdx.x * blockDim.x + threadIdx.x;

            //__shared__ T sdata[ncols * blockSize];
            extern __shared__ T sdata[];
            // input data into the shared memory
            for(int i = 0; i < ncols; i++) 
            {
                sdata[tid + i*blockSize] = g_idata[gid + i*N];
            }
            __syncthreads();

            for(unsigned int s = blockDim.x/2; s > 0; s>> 1) {
                if(tid < s) {
                    for(int i = 0; i < ncols; i++)
                    {
                        sdata[tid + i*blockSize] += sdata[tid + i*blockSize + s];
                    }
                }
                __syncthreads();
            }

            if(tid == 0) {
                for(int i = 0; i < ncols; i++)
                {
                    g_odata[blockIdx.x + i*blockDim.x] = sdata[i*blockSize];
                }
            }
        }

Topic		Replies	Views
Matrix Reduction CUDA Programming and Performance	7	8504	November 18, 2009
How to do Reduction in column for a matrix CUDA Programming and Performance	2	1631	May 3, 2019
sum columns of a 2 dimensional array with Reduce algorithm CUDA Programming and Performance	0	1201	December 6, 2018
Paralel Reduction With less than 8000 values CUDA Programming and Performance	27	8073	July 22, 2010
Multiple Reduction in a 2D array Using the easiest reduction example of the SDK CUDA Programming and Performance	6	1906	November 18, 2009
Add Rows of a Matrix Matrix row addition incredibly slow... CUDA Programming and Performance	3	4472	July 22, 2010
Need help on Paralle Reduction CUDA Programming and Performance	0	1107	March 8, 2010
Cuda : Reduce (max/min) function on matrix implementation CUDA Programming and Performance	1	1779	August 22, 2019
Summation and inter-thread communication CUDA Programming and Performance	4	2660	September 6, 2009
Parallel sum reduction 2D CUDA Programming and Performance	10	293	January 10, 2025

Column wise reduction of column major matrix

Related topics