I’m porting some code from Java to CUDA which includes calls to the BLAS library, so I’m using the corresponding CUBLAS calls in the CUDA code. I was wondering if there is any way to make CUBLAS calls from the device rather than the host, to allow for different function calls depending on the thread ID or block ID.