Hi,I am new to CUDA. I would like to know if the kernel is launched and terminated each time we use any of the library routines in CUBLAS or CUSPARSE since these routines can only be called from the host code.Considering an application that needs to make use of multiple such calls say,for eg. the conjugate gradient routine provided in the SDK. Is there any way speed up could be attained using the library routines.
Also how do we know the number of threads involved in the code that use these libraries.