How to efficently copy the non-zero elements in an array to another array

beluga · December 28, 2023, 8:23am

I want to copy all the non-zero elements in a 1-D array to another array output, I try to do this at warp-level, like

each warp can use __ballot_sync to find out how many non-zero elements in the warp.
__sync_threads
each warp find it start pos and end pos write in the output array.

so my question is how to map the lane_id to the idx in the output array in the warp? I’ll appreciate if anyone can give me some advice.

I already find a relative post which use thrust library, it use the copy_if function, but how to use naive cuda to implement this?

I also find a code, but it is done in the block level, and I think frequently atomic add idx per thread might not be the best choice, in this blog, it shows it can be done by just one thread in a warp adding n instead of n thread adding one

__global__ void gpu_Xn(int *pHist, int pnN, int* pXn) 
{
    int Tid ;
    Tid = threadIdx.x ;

    __shared__ int tmpXn[256] ;
    __shared__ int idx ;

    tmpXn[Tid] = -1 ;
    if(Tid == 0) idx = 0  ;

    __syncthreads() ;

    if(pHist[Tid] !=0)
    {
        int x = atomicAdd(&idx, 1) ; 
        tmpXn[x] = Tid ;
    }

    __syncthreads() ;
    if(Tid < pnN)
        pXn[Tid] = tmpXn[Tid] ;
}

Robert_Crovella · December 28, 2023, 2:51pm

Topic		Replies	Views
copy global memory by kernel threads CUDA Programming and Performance	1	5956	January 23, 2011
How to put specific elements from one array to another array use CUDA? CUDA Programming and Performance cuda	6	1304	October 30, 2022
copy global memory by CUDA threads CUDA Programming and Performance	3	1208	January 17, 2011
Copying part of 2D array to device CUDA Programming and Performance	2	863	March 22, 2012
minimizing data transfers from cpu to gpu CUDA Programming and Performance	1	2093	February 17, 2010
each thread perform an operation on diffrent part of an array CUDA-GDB	0	586	May 4, 2019
CUDA 2d Array Mapping CUDA Programming and Performance	1	3474	April 24, 2015
copy from 1D array to shared memory matrix in cuda CUDA Programming and Performance	7	2054	June 9, 2015
How to index last element of a row/column of an array selectively index specific elements of an arra CUDA Programming and Performance	5	5838	November 9, 2010
parallel find find multiple items from a array CUDA Programming and Performance	4	4382	February 23, 2009

How to efficently copy the non-zero elements in an array to another array

Related topics