Many reads from the same GMEM K-Means algorithm

Im building a k-means application where one of the steps are to compare a set of vectors (the data) to another set of vectors (the centers). My first implementation uses float vectors of length 16 and just reads them to the shared memory (each block reads 32 floats (2 vectors) from the data and from the centers).
The reads should coalesce, but wont the reads from the centers have problems as all the blocks are trying to access the same memory? If the 16xfloat center vectors are denotet C1, C2, C3… and the blocks are denoted B1, B2, B3 the access the centers are like:

(Vectors are in global memory)

B1: C1C2 C3C4 C5C6…
B2: C1C2 C3C4 C5C6…
B3: C1C2 C3C4 C5C6…

B128… (or higher)

Wont there be a problem when all blocks want to access C1C2 at the same time, or does the hardware account for this? Im mostly thinking about if B2 must wait for B1 to finish, this might take a while for all blocks to have done a read. Are there any better way of doing this than coalesce into shared mem?

Kind Regards

… I have considered to use texture, but I can have up to 100 centers ( 100*16 floats) so Im not sure if the cache will be of much use. Any ideas? Scramble the order, so each block reads the blocks in a random order instead of 0,1,2,3,4…?