I am using the level three BLAS function ssyrk() which multiplies a matrix with its transpose. My matrix size is n x 3. I do not see any gain in performance while doing this operation. My CPU multiplication is faster. Is there any reason for this? I would grealty appreciate any help.