Question: Tiling on GPUs

I have seen many times the tiling technique as very good method in order to achieve better perormance.
Eventhough, through the documentations I have seen, it is not clear to me what exactly is the tiling on GPUs.

I have that tiles refers to cuda blocks, or parts of the block that are stored in shared memory.
Additionaly, I have seen that there is tilling in thread level and in block level.

Could you please, explain me what the tile technique in CUDA refere to? And what is the difference between thread level tile and block level tile?

The programming guide provides one example of shared memory (block-level) tiling:

Thank you for the block-level tiling understanding.

Could you please me explain me the thread-level tiling as well?