 # Vector transpose times a vector

I am trying to perform a vector operation where the transpose of a vector is multiplied by another vector.

I have tried with CUF kernels:

``````!Subroutine ScalarDivVecDotVec(alpha,P,Q,eCG,nTot)

Implicit None

! Local Vars
Integer:: i, j
Real(fp_kind), Device:: PdotQ

! Passed Vars
Integer, Value:: nTot
Real(fp_kind), Device, Intent(IN):: P(3*nTot), Q(3*nTot), eCG
Real(fp_kind), Device, Intent(INOUT):: alpha

!\$cuf kernel do(1) <<<*,*>>>
Do i = 1, nTot
PdotQ = PdotQ + P(i)*Q(i)
End Do
alpha = eCG/PdotQ

End Subroutine
``````

This gave me an error saying that “more than one resident device variable”

Then I tried to use an atomicadd:

``````Attributes(Global) Subroutine ScalarDivVecDotVecGPU(alpha,P,Q,eCG,nTot)

Implicit None

! Local Vars
Integer:: i, j, istat
Real(fp_kind), Device:: PdotQ

! Passed Vars
Integer, Value:: nTot
Real(fp_kind), Device, Intent(IN):: P(3*nTot), Q(3*nTot), eCG
Real(fp_kind), Device, Intent(INOUT):: alpha

If (i >= 1 .and. i <= nTot) Then
istat = atomicadd(PdotQ,P(i)*Q(i))   !PdotQ = PdotQ + P(i)*Q(i)
alpha = eCG/PdotQ
End If

End Subroutine
``````

This would not compile (some undefined error)

I would have thought that the CUF kernel could do this since I can succesfully do:

``````Subroutine VecdotVec(V,eCG,nTot)

Implicit None

! Local Vars
Integer:: i, j

! Passed Vars
Integer, Value:: nTot
Real(fp_kind), Device, Intent(IN):: V(3*nTot)
Real(fp_kind), Device, Intent(INOUT):: eCG

!\$cuf kernel do(1) <<<*,*>>>
Do i = 1, nTot
eCG = eCG + V(i)*V(i)
End Do

End Subroutine
``````

Is there a way to do this easily on the GPU? I would prefer to stay away for cuBlas and so on for now…

Any help is greatly appreciated,

Kirk

Well…

Simple enough to fix. the problem was that I was trying to perform

``````alpha = eCG/PdotQ
``````

on the host (all three are device variables). Nothing to do with the cuf kernel after all.

Easy enough to fix.