I’m trying to use tensor maps with swizzling, but any swizzle size that is smaller than the innermost tile size (i.e. 32B swizzle with a 64B tile) returns a CUDA_ERROR_INVALID_VALUE. I can’t find any requirement for this in either the PTX ISA or the cuTensorMapEncodeTiled documentation.
When interleave is CU_TENSOR_MAP_INTERLEAVE_NONE and swizzle is not CU_TENSOR_MAP_SWIZZLE_NONE, the bounding box inner dimension (computed as boxDim[0] multiplied by element size derived from tensorDataType) must be less than or equal to the swizzle size.
It seems like your tensorDataType specifies 2 bytes per element, and your boxDim[0] is 128, so that would result in a product of 256, which is larger than your swizzle size.