With a float4 array do the different ways of allocating an array of float4, and a texture for that array and ways of fetching from that texture have different performances?
At the moment I declare a float4 array with cudaMalloc,
declare the related texture with texture<float4, 1, cudaReadModeElementType>
and fetch with tex1Dfetch.
What benefit would I gain in allocating with cuArray3DCreate or cudaMallocPitch or any other array allocation or access function?