linear algebra on threads?

Does CUBLAS have support for performing linear algebra operations in threads themselves?

From what I can tell, it replaces BLAS by using the GPU (which is an awesome capability in itself), but I’m looking into having each thread perform some linear algebra operations on small matrices. Is this supported?