Hello everyone,
The program I’m trying to edit contains several classes. My problem is that in class A I have this loop that I would like to run on the gpu:
for (iPoint = 0; iPoint < nPoint; iPoint++) {
Density = nodes->GetDensity(iPoint);
Velocity2 = 0.0;
for (iDim = 0; iDim < nDim; iDim++)
Velocity2 += pow(nodes->GetSolution(iPoint,iDim+1)/Density,2);
StaticEnergy= nodes->GetEnergy(iPoint) - 0.5*Velocity2;
GetFluidModel()->SetTDState_rhoe(Density, StaticEnergy);
Pressure= GetFluidModel()->GetPressure();
Temperature= GetFluidModel()->GetTemperature();
/*--- Use the values at the infinity ---*/
su2double Solution[MAXNVAR] = {0.0};
if ((Pressure < 0.0) || (Density < 0.0) || (Temperature < 0.0)) {
Solution[0] = Density_Inf;
for (iDim = 0; iDim < nDim; iDim++)
Solution[iDim+1] = Velocity_Inf[iDim]*Density_Inf;
Solution[nDim+1] = Energy_Inf*Density_Inf;
nodes->SetSolution(iPoint,Solution);
nodes->SetSolution_Old(iPoint,Solution);
counter_local++;
}
}
In the code nodes->
is a pointer to a Class C but it is defined in Class B:
In class B.hpp
#include C.hpp
C* nodes = nullptr;
when i try to add Openacc directives:
#pragma acc kernels
{
for (iPoint = 0; iPoint < nPoint; iPoint++) {
su2double TDVariables[2]={0.0};
Density = nodes->GetDensity(iPoint);
TDVariables[0]=Density;
Velocity2 = 0.0;
for (iDim = 0; iDim < nDim; iDim++)
Velocity2 += pow(nodes->GetSolution(iPoint,iDim+1)/Density,2);
StaticEnergy= nodes->GetEnergy(iPoint) - 0.5*Velocity2;
TDVariables[1]=StaticEnergy;
GetFluidModel()->SetTDVariables(iPoint,nPoint,TDVariables);
}
}
The compiler reports:
CEulerSolver::CEulerSolver(CGeometry *, CConfig *, unsigned short, bool):
74, Generating enter data copyin(this[:1])
Generating enter data create(Velocity2,StaticEnergy,Density,Temperature,Pressure)
288, Generating implicit private(iPoint)
Generating implicit copyin(this[:]) [if not already present]
290, Complex loop carried dependence of TDVariables,__b_18CFVMFlowSolverBaseI14CEulerVariableL11ENUM_REGIME0EE.nodes,__b_18CFVMFlowSolverBaseI14CEulerVariableL11ENUM_REGIME0EE.nodes-> prevents parallelization
Loop carried dependence of __b_18CFVMFlowSolverBaseI14CEulerVariableL11ENUM_REGIME0EE.nodes-> prevents parallelization
Loop carried backward dependence of __b_18CFVMFlowSolverBaseI14CEulerVariableL11ENUM_REGIME0EE.nodes-> prevents vectorization
Scalar last value needed after loop for Density at line 318
Complex loop carried dependence of __b_18CFVMFlowSolverBaseI14CEulerVariableL11ENUM_REGIME0EE.__b_7CSolver.nDim prevents parallelization
Loop carried dependence of __b_18CFVMFlowSolverBaseI14CEulerVariableL11ENUM_REGIME0EE.nodes prevents parallelization
Loop carried backward dependence of __b_18CFVMFlowSolverBaseI14CEulerVariableL11ENUM_REGIME0EE.nodes prevents vectorization
Complex loop carried dependence of __b_18CFVMFlowSolverBaseI14CEulerVariableL11ENUM_REGIME0EE.nodes-> prevents parallelization
Loop carried dependence of this-> prevents parallelization
Loop carried backward dependence of this-> prevents vectorization
Complex loop carried dependence of this-> prevents parallelization
Loop carried dependence of ->__vptr prevents parallelization
Loop carried backward dependence of ->__vptr prevents vectorization
Loop carried dependence of ->__vptr-> prevents parallelization
Loop carried backward dependence of ->__vptr-> prevents vectorization
Loop carried dependence of __b_18CFVMFlowSolverBaseI14CEulerVariableL11ENUM_REGIME0EE.__b_7CSolver.nPoint prevents parallelization
Loop carried backward dependence of __b_18CFVMFlowSolverBaseI14CEulerVariableL11ENUM_REGIME0EE.__b_7CSolver.nPoint prevents vectorization
Complex loop carried dependence of __b_18CFVMFlowSolverBaseI14CEulerVariableL11ENUM_REGIME0EE.nodes->__b_13CFlowVariable.__b_9CVariable prevents parallelization
Loop carried dependence of __b_18CFVMFlowSolverBaseI14CEulerVariableL11ENUM_REGIME0EE.nodes->__b_13CFlowVariable.__b_9CVariable prevents parallelization
Loop carried backward dependence of __b_18CFVMFlowSolverBaseI14CEulerVariableL11ENUM_REGIME0EE.nodes->__b_13CFlowVariable.__b_9CVariable prevents vectorization
Conditional loop will be executed in scalar mode
Accelerator serial kernel generated
CUDA shared memory used for TDVariables
Generating implicit private(iPoint,StaticEnergy,iDim,Velocity2)
Generating NVIDIA GPU code
290, #pragma acc loop seq
299, #pragma acc loop seq
290, CUDA shared memory used for TDVariables
Complex loop carried dependence of __b_18CFVMFlowSolverBaseI14CEulerVariableL11ENUM_REGIME0EE.nodes,__b_18CFVMFlowSolverBaseI14CEulerVariableL11ENUM_REGIME0EE.nodes->,__b_18CFVMFlowSolverBaseI14CEulerVariableL11ENUM_REGIME0EE.nodes->__b_13CFlowVariable.__b_9CVariable prevents parallelization
Loop carried dependence of __b_18CFVMFlowSolverBaseI14CEulerVariableL11ENUM_REGIME0EE.__b_7CSolver.nPoint prevents parallelization
Complex loop carried dependence of __b_18CFVMFlowSolverBaseI14CEulerVariableL11ENUM_REGIME0EE.nodes->__b_13CFlowVariable.__b_9CVariable prevents parallelization
Conditional loop will be executed in scalar mode
299, Complex loop carried dependence of __b_18CFVMFlowSolverBaseI14CEulerVariableL11ENUM_REGIME0EE.nodes->__b_13CFlowVariable.__b_9CVariable,__b_18CFVMFlowSolverBaseI14CEulerVariableL11ENUM_REGIME0EE.nodes prevents parallelization
Loop carried scalar dependence for iDim at line 300
Generating implicit private(iDim)
Complex loop carried dependence of __b_18CFVMFlowSolverBaseI14CEulerVariableL11ENUM_REGIME0EE.nodes->__b_13CFlowVariable.__b_9CVariable,__b_18CFVMFlowSolverBaseI14CEulerVariableL11ENUM_REGIME0EE.nodes prevents parallelization
Loop carried scalar dependence for iDim at line 300
I’ve done several tests but none seem to work, so my question is: how can I correctly treat the nodes->
pointer to be used by the gpu through openacc?