I am a newbie in GPU. I have a Fortran code for an element-by-element matrix-vector multiplication in the finite-element model:
Are there any CUDA or OpenACC code for this purpose.
I would be grateful for any help or suggestion!
I don’t know of any examples that does a similar thing, but you should be able to port this to the GPU fairly easily using OpenACC. Using the “kernels” directive, the compile will be able to parallelize your array syntax, but you’ll gain greater control over the loop scheduling if you make these explicit loops. Matmul is a problem since we don’t support this on the device, but it’s straight forward to implement as loops instead, or you can call the device version of cuBLAS DGEMM/SGEMM.
Though, I’d suggest you familiarize yourself with OpenACC first before beginning. You can find lots of resources on openacc.org, including an online course. See: https://www.openacc.org/resources