cublas from kernel slower than from host

Hello , I tried to run my program calling cublas from kernel and I saw that it takes more time and more memory than calling it from host.

host:

Used GPU time :  171.639557 msec 
GPU memory usage: used = 135.488281 MB

kernel:

Used GPU time :  1294.184204 msec 
GPU memory usage: used = 374.246094 MB