Hi, Is there extra performance gain for cublas lib by using cudaMallocPitch() to coalesce memory allocation? Thanks Bill