[Urgent] Can I use cuBLAS functions in multicore CPU parallelism with OpenACC?

Hi NVIDIA team!

I have a working code that uses OpenACC and cuBLAS functions (dgemm) in GPU. Now, I would like to make a CPU multicore version of that code.

As far as I’m concern, I think I just have to make little changes in the Makefile and maybe delete some of the data movement (as there is no separated device memory now).

So I’ve added the ta=multicore tag to the compiler, and deleted the memory copyins, copys, copyouts from OpenACC at the beginning of the parallel region.

When compiling I get the “generating multicore code” when it finds a parallel loop. But when I execute it, it fails with error 700 (illegal memory address) and cublastatus_t = 13 (returned by dgemm).

Is this my fault? Have I forgotten something? Or is it just not possible to use cuBLAS in multicore, for whatever reason?

Thank you!

Are you trying to execute cuBLAS code on the CPU?

That is not possible. cuBLAS strictly executes on the GPU.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.