(1) Operating on compactly stored 2D arrays allocated with cudaMalloc() is fully functional, including via CUBLAS.
(2) Using 2D arrays stored in pitched allocations made with cudaMallocPitch() may improve performance. To use such allocations with CUBLAS, simply adjust the lda, ldb, ldc arguments accordingly.
I find compact storage for 2D matrices easier to deal with and the performance gains from pitched storage are likely minor on modern GPUs (it helps minimize the number of wide loads performed by the GPU due to better data alignment). I have used pitched 2D matrices when using textures.