I have two 3 dimensional boolean arrays representing segmentation output from algorithm (array A) and gold standard mask (arrayG) that I put into kernel I would like to compare them, and in theory logic gates could be the fastest solution.

I need to count true positive (TP), false positive (FP), true negative (TN) and false negative(FN). so for some coordinate x’.

TP = A[x’]==1 && G[x’]==1 – hence AND gate

TN = A[x’]==0 && G[x’]==0 – hence NAND gate

FP = A[x’]==1 && G[x’]==0 --hence (1 AND A[x’]) AND ( 0 NAND G[x’] )

FN = A[x’]==0 && G[x’]==1 --hence (0 NAND A[x’]) AND ( 1 AND G[x’] )

Hence output from such warp should be Vector with 4 entries [count(TP), count(TN), count(FP) , count(FN) ]

Is it possible and does it make sense to apply such things in warp level primitive- or maybe in kernel itself? There is very little daa to exchange between threads so warp level synchronization seems to be better then block synchronization.

Later I would use sum reduction kernel methods, maybe cooperative groups, depends on experiments results.

Thanks for help !