Alright men (and women), prove me wrong!

I have been testing with 2D textures bound to cudaMallocPitch arrays. I’ve found the following:

I can read from the 2D texture and write to another array (linear/pitch whatever) with defined results for cudamallocpitch arrays of dimension 16xN (where N is an integer) rows and a width that aligns with the pitch (i.e. widths that work for me are 64, 128, 256… etc). 2D Texture fetches bound to 2D arrays that do not follow this rule will give undefined results.

The rule-
(rows,columns) or (height,width) whatever you call it… must be the following:
(16N,64M)

where N=1,2,3,4… and M=1,2,4,8…

Specifically I am fetching from a 2D texture bound to a 2D array of floats using normalized texture coordinates, with wrapping enabled, and point mode filtering.

Prove the rule wrong (and post code that works) ladies or gent’s, and you will be the proud recipient of one internets (overnight delivery guaranteed). If you find you must obey the rule then nvidia should put this in their CUDA programming guide, as this is integral in understanding textures.

This is the game! Lets see who can win : )