Call cuBLAS from device function

It seems that in CUDA 10.0 the “simpleDevLibCublas” sample is removed, and one can no longer call a cuBLAS library function from device. I wonder if there’s any way around this (or an alternative solution) other than reverting to older CUDA versions.

In particular, I would like to call cublasSdot() in my global function, instead of computing a dot product by a for loop for each CUDA thread.

Thanks!

@shaojieb did you find any alternative solution?