Flex - Low simulation speed in custom engine

I have flex working in a custom c++ engine but the simulation speed is very slow. What could be the cause of this?

Here are some timings from 20000 particles. Using the demo build the simulation only takes 2ms for the equal amount of particles.
Running the sim with just 1 particle takes 7 ms

timings: http://imgur.com/xyxPHIo

I am using NvFlexVectors to store the particle data in.