How can we improve the performance of simpleFoam solver (integrated with paralution_PCG)in OpenFOAM

Hello,

I am trying to accelerate the performance of simpleFoam solver in Openfoam on GPU. To integrate openfoam with gpu, I am using paralution library. I have integrated the simpleFoam solver of OpenFOAM-2.4.0 with paralution_PCG. The solver is compiled successfully. But, when I execute a test case using that compiled solver, I get the increase in execution time as compared with the execution time required to execute a test case using simpleFoam solver (without using paralution_PCG). I have taken grid size of 404040 and executed the test case for different number of gpu cores like 4, 8, 16, 32, 64. Even, I have tried by increasing the grid size to 808080, the same result is there. As paralution library is used to accelerate the performance of the solver, but in my case it is not like that…
What is the problem… why this is happening?
Can anyone please help me…