i made some experimentations on shared memory and there is something that is squeezing my head,
When i want to access 128 Bits data spreaded on the first four banks i got 4 accesses, i got it the thread can only handle 32 bit at a time.
When i create a conflict with another thread ( for exemple thread 0 in B0,0 B1,0 B2,0 B3,0 and thread 2 in B0,1 B1,1 B2,1 B3,1 where Bbank,row) i got 5 accesses, i presume it takes B0,0 then B0,1 and B1,0 then B1,1 and B2,0 then B2,1 and B3,0 then finally B3,1, with kind of pipeline access and we have n+3 transactions where n is the number of conflict, but when i achieve 9 conflicts i only got 11 transactions and this n+2 parten stand until 17 transactions when it becomes n+1 and finally it becomes n starting from 25 conflicts. With 64 bits accesses it hapened only at 17 where it goes from n+1 to only n.
My Question is why there is this leap at 9,17,25 (128 bits) and 17 (64 bits) ?
I give you my code :
to use it like i do
./a.out <size_data: either 32, 64 or 128> <acording to size data: 32 or16 or 8 to always perfom conflicts> 0 FFFFFFFF
main.cu (4.6 KB)
res.md (1.1 KB)