Hi NVIDIA team!
I have a working code that uses OpenACC and cuBLAS functions (dgemm) in GPU. Now, I would like to make a CPU multicore version of that code.
As far as I’m concern, I think I just have to make little changes in the Makefile and maybe delete some of the data movement (as there is no separated device memory now).
So I’ve added the ta=multicore
tag to the compiler, and deleted the memory copyins, copys, copyouts from OpenACC at the beginning of the parallel region.
When compiling I get the “generating multicore code” when it finds a parallel loop. But when I execute it, it fails with error 700 (illegal memory address) and cublastatus_t = 13 (returned by dgemm).
Is this my fault? Have I forgotten something? Or is it just not possible to use cuBLAS in multicore, for whatever reason?
Thank you!