Optimal vertex and index layout order on modern GPU's?

( NOTE: Initially posted this on the openGL forum. But later thought it better here. sorry!? )

Hi there.

I’m coding for modern GPU’s. ( OpenGL 4.x etc… )

The mesh data I’m sending to the graphics card is pretty much directly output from Maya. I’m guessing they will have poor vertex-cache-ordering.
I’m hoping to use a vertex-cache-optimization pre-pass on the meshes to gain some performance. As suggested here
http://home.comcast.net/~tom_forsyth/papers/fast_vert_cache_opt.html

The meshes have 500000+ triangles in them. ( Rendered as triangles via index buffer )
And we do a number of shadow passes too. Which puts more requirements on vertex through-put.

The thing is…
I have tried to use TomF’s algorithm (as described in the link) and it actually made things slower! :( And I definitely performed all steps.
ie

  • index buffer re-ordering
  • rebuild vertex buffers using the new index ordering to achieve near-linear access

Any idea why this would be? I’m guessing the assumptions that Tom made back in 2006 do not hold for modern GPUs?

NOTE:
I used both of these implementations

And both achieved the same slowdown. ( from 31fps to 29fps )
So I’m guessing both have a consistent ( and therefore hopefully correct ) implementation.

My Question:
How should I approach this problem given modern-day architectures?
How should I be ordering the data to achieve the best performance on the card?
Is it worth doing anything at all?

Thanks a lot! :)
Brian

NOTE:
We’ve now sorted this. Information here
=> https://devtalk.nvidia.com/default/topic/522925/opengl/optimal-vertex-and-index-layout-order-on-modern-gpu-39-s-/