First of all, let me describe the use of this function :
__device__ void check(int *dTarget, int *dtoCheck)
dTarget = [ [x0,x1,x2], [x3,x4,x5] ]
dToCheck = [ [x0,x1,x2], [x3,x4,x5] ]
The objective is to check if in each sub array in toCheck the max value is the same index as the corresponding max index in dTarget such as :
dTarget = [ [0,0,1], [1,0,0] ]
dToCheck = [ [0.25,1,5], [7,7,15] ]
maxIndex([0,0,1]) = 2
maxIndex([0.25,1,5]) = 2 Correct !
maxIndex([1,0,0]) = 0
maxIndex([7,7,15]) = 2 Wrong !
Offcourse it can be way more value in each sub array, and a lot of sub arrays (1000 - 10 000).
My first thought was to give to each single thread the job of taking the corresponding subarray in each matrix, then check if indexes are correct. But if i do so, with this representation i think that global memory access isn’t coalesced right ? The initial reprensation should be :
dTarget = [ [x0,x3][x1,x4],[x2,x5]]
dToCheck = [ [x0,x3][x1,x4],[x2,x5]]
in order to get coalesced access memory within each thread in a block right ?
Maybe a better strategy could be adopted ?
PS : i still have no code, i’m just thinking about this solution.
Thanks a lot,