Do different textures and fetches have different performances?

With a float4 array do the different ways of allocating an array of float4, and a texture for that array and ways of fetching from that texture have different performances?

At the moment I declare a float4 array with cudaMalloc,
declare the related texture with texture<float4, 1, cudaReadModeElementType>
and fetch with tex1Dfetch.

What benefit would I gain in allocating with cuArray3DCreate or cudaMallocPitch or any other array allocation or access function?