PhysX-3.3 fluid simulation in personal program way slower than in the samples

Hi,
The configuration I’m using is PhysX-3.3, mscv 2013 x64, cuda 7.5
I’m currently trying to use the sph fluid simulation.
I successfully built the samples and also a program using the sph fluid simulation. However my program is hugely slower than the samples. So I’m wondering if I missed something.
Before showing the code here I precise that I activated the gpu usage (and it works, but maybe my initialization of it is lacking something for an optimal usage). Also I allocated a scratch block for the simulate function (though I haven’t seen any amelioration with it). I use a time step of 1/60s (the same as in the samples). Finally the rendering is not the source of the problem (I even removed it in my program to make sure).
To test the performances I just created 20 000 particles that gather in a container (made using 5 planes). For the parameters of the fluid I used the one used in the samples.
With my program I get 180frames/sec when with the same parameters (unless I missed something) the samples get around 450frames/sec.

If someone has any idea on what I missed I’d be happy to try it.

The following code is my configuration. If needed I can post the whole program.
Here is my initialization of the PhysX context

//allocate the scratch block for later use
    m_scratch_block=SCRATCH_BLOCK_SIZE ? PHYSX_ALLOC(SCRATCH_BLOCK_SIZE) : 0;

//init physX
    m_foundation = PxCreateFoundation(PX_PHYSICS_VERSION, m_allocator, m_errorCallback);
    m_physics = PxCreatePhysics(PX_PHYSICS_VERSION, *m_foundation, physx::PxTolerancesScale());


    physx::PxSceneDesc sceneDesc(m_physics->getTolerancesScale());
    sceneDesc.gravity = physx::PxVec3(0.0f, -9.81f, 0.0f);

    sceneDesc.filterShader	= physx::PxDefaultSimulationFilterShader;
    m_dispatcher = physx::PxDefaultCpuDispatcherCreate(4);
    sceneDesc.cpuDispatcher	= m_dispatcher;


    //testing GPU dispatcher
    //*
    physx::PxProfileZoneManager* profileZoneManager = &physx::PxProfileZoneManager::createProfileZoneManager(m_foundation);
    physx::PxCudaContextManagerDesc cudaContextManagerDesc;
    physx::PxCudaContextManager* cudaContextManager =
            physx::PxCreateCudaContextManager(*m_foundation,cudaContextManagerDesc,profileZoneManager);

    if(cudaContextManager)
    {
        if(!sceneDesc.gpuDispatcher)
        {
            sceneDesc.gpuDispatcher = cudaContextManager->getGpuDispatcher();
        }
    }
    m_scene = m_physics->createScene(sceneDesc);

And here is the call for the simulate function (with delta_t being 1/60s)

void PhysXWorld::advanceInTime(double delta_t){
    m_scene->simulate(delta_t,0,m_scratch_block,SCRATCH_BLOCK_SIZE);
    m_scene->fetchResults(true);
}

Here is the code for the initialization of the fluid and container

physx::PxParticleFluid* ps;
//we create a container in which the particle will gather
    m_material = m_physics->createMaterial(0.5f, 0.5f, 0.6f);
    physx::PxRigidStatic* groundPlane = physx::PxCreatePlane(*m_physics, physx::PxPlane(0,1,0,0), *m_material);
    m_scene->addActor(*groundPlane);

    physx::PxRigidStatic* plane;
    plane= physx::PxCreatePlane(*m_physics, physx::PxPlane(1,0,0,1), *m_material);
    m_scene->addActor(*plane);

    plane= physx::PxCreatePlane(*m_physics, physx::PxPlane(-1,0,0,1), *m_material);
    m_scene->addActor(*plane);

    plane= physx::PxCreatePlane(*m_physics, physx::PxPlane(0,0,1,1), *m_material);
    m_scene->addActor(*plane);

    plane= physx::PxCreatePlane(*m_physics, physx::PxPlane(0,0,-1,1), *m_material);
    m_scene->addActor(*plane);
    //*/

    //test particle system
    //*
    // set immutable properties.
    physx::PxU32 maxParticles = 20000;
    bool perParticleRestOffset = false;

    // create particle system in PhysX SDK
    ps = m_physics->createParticleFluid(maxParticles, perParticleRestOffset);
    physx::PxReal particleDistance=0.05;
    ps->setGridSize(5.0f);
    ps->setMaxMotionDistance(0.3f);
    ps->setRestOffset(particleDistance*0.3f);
    ps->setContactOffset(particleDistance*0.3f*2);
    ps->setDamping(0.0f);
    ps->setRestitution(0.3f);
    ps->setDynamicFriction(0.001f);
    ps->setRestParticleDistance(particleDistance);
    ps->setViscosity(60.0f);
    ps->setStiffness(45.0f);


    // add particle system to scene, in case creation was successful
    // we just create huge columns of particle to make it easy
    if (ps){
        m_scene->addActor(*ps);

        unsigned int numNewAppParticles=4000;

        ParticleData pdata(maxParticles);

        pdata.numParticles=numNewAppParticles;
        for (int k=0;k<3;++k){
            for (int j=0;j<3;++j){
                for (int i=0;i<numNewAppParticles;++i){
                    pdata.positions[i]=physx::PxVec3(-0.5+k*0.5,0.05*i,-0.5+j*0.5);
                    pdata.velocities[i]=physx::PxVec3(0);
                }
                createParticles(pdata);
            }
        }


    }else{
        std::cout<<"failed to create particle system";
    }

Finally to activate the usage of the gpu for the fluid I use the following method

void PhysXWorld::toggle_gpu()
{
    m_use_gpu=!m_use_gpu;
    physx::PxSceneWriteLock scopedLock(*m_scene);
    if (ps){
        ps->setParticleBaseFlag(physx::PxParticleBaseFlag::eGPU, m_use_gpu);
    }
}