I have a large array A
with size_A
rows and 6 columns. I am going to check the 4th element of each row, and if that is not zero, copy the row into another array B
. Can I have the index to the entries of B
without using a for
loop, please see the below code?
I probably would need to define b_ptr
somehow to make it static (similar to the what we have in C), but I think that is not allowed in CUDA.
__global__ void filtering_kernel(float* A, int size_A, float* B, float* size_B)
{
/*B and size_B are the outputs*/
int b_ptr = 0;
int x = blockIdx.x * blockDim.x + threadIdx.x;
if (x > size_A) return;
for (int i = 0; i < size_A; i++)
{
if (A[x + 3] != 0)
{
B[b_ptr] = A[x + 0];
B[b_ptr + 1] = A[x + 1];
B[b_ptr + 2] = A[x + 2];
B[b_ptr + 3] = A[x + 3];
B[b_ptr + 4] = A[x + 4];
B[b_ptr + 5] = A[x + 5];
b_ptr += 6;
*size_B = *size_B + 1;
}
}
}