With mmap, loading huge models can be slower due to lot of page faults and related overhead as ~60GB is brought lazily page by page into page cache.
Could you run below command and check.
$ sudo bash -c "echo 8192 > /sys/block/nvme0n1/queue/read_ahead_kb"
Increasing ‘read_ahead_kb’ for NVME can help the kernel prefetch big chunks and reduces page-faults.
I tried it and observed that in Kernel-v6.17, mmap time reduced by ~50% and no-mmap time reduced by ~35%.
In Kernel-v6.14, mmap time didn’t reduce significantly and no-map reduced by ~50%.
This can be due to improvements in ‘read_ahead’ related Kernel code between v6.14 to v6.17.