Structure of Array and Array of Structure show the same performance on GTX 2080Ti

According the demo code involved in the book Professional CUDA C Programming, I try to figure out the different performance between structure of array and array of structure. It is easy to understand that the structure of array provides up a better memory access pattern. So it should be have better performance than array of structure. The following two links are the related demo code

Array of structure
Structure of array

I run these program with my GTX2080Ti and I find that there is no difference between the two. I also try to adjust the length of array to see whether there will be different results. Eventually, there is no big difference.

./AOS test struct of array at device 0: GeForce RTX 2080 Ti
warmup <<< 2097152, 128 >>> elapsed 0.007955 sec
innerstruct <<< 2097152, 128 >>> elapsed 0.007946 sec

./SOA test struct of array at device 0: GeForce RTX 2080 Ti
warmup2 <<< 2097152, 128 >>> elapsed 0.007954 sec
innerarray <<< 2097152, 128 >>> elapsed 0.007898 sec

I also run the same program on Tesla V100 which provides me different results.

./AOS test struct of array at device 0: Tesla V100-SXM2-16GB
warmup <<< 2097152, 128 >>> elapsed 0.007168 sec
innerstruct <<< 2097152, 128 >>> elapsed 0.007096 sec

./SOA test struct of array at device 0: Tesla V100-SXM2-16GB
warmup2 <<< 2097152, 128 >>> elapsed 0.005401 sec
innerarray <<< 2097152, 128 >>> elapsed 0.005348 sec

My question is why we cannot see the different performance with GTX2080Ti. Does it because the hardware do some optimization? Thank you !