How is the triangle vertex data of BVH arranged in memory?

I have found that arranging the triangles more closely together reduces the time of BVH traversal, why is this?

Please explain what you mean with “arranging the triangles more closely together” in absolute coordinates.
Maybe draw some schematic picture.
About how many built-in triangles are we talking?
What is the absolute performance difference you see on what system configuration?

Because of the increased spatial locality?

The BVH traversal time depends on how many AABBs the traversal needs to check for intersection.
The more AABBs can be trivially rejected, the faster the actual intersections get determined.
This depends on your scene structure, the BVH builders, and the rays you shoot.

And I want to know how is the triangle vertex data of BVH arranged in memory?

How the BVH builders work exactly is confidential information and changes with GPU architectures and even driver releases.

OptiX has some flags which influence how the acceleration structure is built.
Described here:
Described here:

Then there is acceleration structure compaction which will improve memory locality of the final AS data.

Also there is the OPTIX_BUILD_FLAG_ALLOW_RANDOM_VERTEX_ACCESS flag which allows to read vertex positions from the AS which can affect performance negatively.

Is it arranged according to spatial location or just according to the BVH structure?

The BVH structure depends on the spatial location of the primitives and their sizes.

Slightly related topics of how bad spatial scene structures can affect performance:
Bad grouping of GAS contents under IAS:
AS refit instead of rebuild:

Also this isn’t the first time you asked about this and the answers won’t change.