I have a question regarding the nvc++ -compiler that I’m using to compile my Monte Carlo pricing engine. I’m running the following native loop:
for (size_t PathIdx = 0; PathIdx < NumberOfPaths; PathIdx++)
{
int NormalIdx = PathIdx * NumberOfSteps;
float Spot = S0;
float Vol = v0;
float zero = 0.0;
for (size_t StepIdx = 0; StepIdx < NumberOfSteps; StepIdx++)
{
Spot *= exp((r - 0.5 * Vol) * dt + sqrt(std::max(Vol, zero)) * sqrdt * SpotRandoms.at(NormalIdx + StepIdx));
Vol += kappa * (vbar - Vol) * dt + zeta * sqrt(std::max(Vol, zero)) * sqrdt * VolRandoms.at(NormalIdx + StepIdx);
}
}
I.e., nothing special. Just simulating Heston price paths, SpotRandoms
and VolRandoms
are std::vector
’s that I have generated before. I have compiled and ran this program with the following compilers and settings:
-
g++
compiler - ran in 5464 ms. -
nvc++
compiler, no-stdpar
flag, ran in 1586 ms. -
nvc++
compiler,-stdpar=gpu
flag, ran in 26ms. -
nvc++
compiler,-stdpar=multicore
flag, ran in 34ms.
I was unaware that just using the -stdpar
compiler will yield a speedup of native loops, I was just aware of the speedup of stl-algorithms using the execution policies. Does anybody have a clear view of what’s going on under the hood of the nvc++
compiler with regards to native loops? For example, are they indeed ported to the GPU? I have profiled the application in Nsight Systems and the CUDA trace seems to indicate not, but I could not find a definitive answer using the compiler documentation.