Vector transpose times a vector

SPHriction-3D · July 4, 2015, 12:22am

I am trying to perform a vector operation where the transpose of a vector is multiplied by another vector.

I have tried with CUF kernels:

!Subroutine ScalarDivVecDotVec(alpha,P,Q,eCG,nTot)

	Implicit None

	! Local Vars
	Integer:: i, j
	Real(fp_kind), Device:: PdotQ

	! Passed Vars
	Integer, Value:: nTot
	Real(fp_kind), Device, Intent(IN):: P(3*nTot), Q(3*nTot), eCG
	Real(fp_kind), Device, Intent(INOUT):: alpha

	!$cuf kernel do(1) <<<*,*>>>
	Do i = 1, nTot
		PdotQ = PdotQ + P(i)*Q(i)
        End Do
	alpha = eCG/PdotQ
		
End Subroutine

This gave me an error saying that “more than one resident device variable”

Then I tried to use an atomicadd:

Attributes(Global) Subroutine ScalarDivVecDotVecGPU(alpha,P,Q,eCG,nTot)

	Implicit None

	! Local Vars
	Integer:: i, j, istat
	Real(fp_kind), Device:: PdotQ

	! Passed Vars
	Integer, Value:: nTot
	Real(fp_kind), Device, Intent(IN):: P(3*nTot), Q(3*nTot), eCG
	Real(fp_kind), Device, Intent(INOUT):: alpha

	i = (blockIdx%x-1)*blockDim%x + threadIdx%x

	If (i >= 1 .and. i <= nTot) Then
		istat = atomicadd(PdotQ,P(i)*Q(i))   !PdotQ = PdotQ + P(i)*Q(i)
		alpha = eCG/PdotQ
	End If
		
End Subroutine

This would not compile (some undefined error)

I would have thought that the CUF kernel could do this since I can succesfully do:

Subroutine VecdotVec(V,eCG,nTot)

	Implicit None

	! Local Vars
	Integer:: i, j

	! Passed Vars
	Integer, Value:: nTot
	Real(fp_kind), Device, Intent(IN):: V(3*nTot)
	Real(fp_kind), Device, Intent(INOUT):: eCG

	!$cuf kernel do(1) <<<*,*>>>
	Do i = 1, nTot
		eCG = eCG + V(i)*V(i)
	End Do
		
End Subroutine

Is there a way to do this easily on the GPU? I would prefer to stay away for cuBlas and so on for now…

Any help is greatly appreciated,

Kirk

SPHriction-3D · July 5, 2015, 2:26am

Well…

Simple enough to fix. the problem was that I was trying to perform

alpha = eCG/PdotQ

on the host (all three are device variables). Nothing to do with the cuf kernel after all.

Easy enough to fix.

Topic		Replies	Views
Operators both on host and device functions Legacy PGI Compilers	21	10663	October 12, 2010
64-bit integers in CUDA Fortran atomics Legacy PGI Compilers	1	3703	January 7, 2013
Nesting a GPU loop inside a CPU loop? nvc, nvc++ and nvfortran	11	1166	August 27, 2021
How to operate variables on GPU Legacy PGI Compilers	1	3303	April 11, 2011
cudaMemcpy Failing To Copy Variable From Device To Host Correctly CUDA Programming and Performance	3	2850	April 26, 2021
Vector Multiplication CUDA Programming and Performance	1	1010	November 21, 2010
Moving device data nvc, nvc++ and nvfortran	3	775	October 5, 2021
Global device variables CUDA Fortran Legacy PGI Compilers	2	4726	April 10, 2015
The output is wrong! it seems gpu doesnt do the work Legacy PGI Compilers	3	1458	October 31, 2018
Device Derived Types Legacy PGI Compilers	4	2936	February 27, 2013

Vector transpose times a vector

Related topics