You can get a lot more verbosity with sparkrun -vvv run <recipe> <options> (each v increases verbosity level). Three is the max level and it’s REALLY verbose. One or two vs is more reasonable in general. The default is fairly sparse on details since if all works according to plan, it’s just noise.
You can use sparkrun run <recipe> <options> --collect-diagnostics "diag.log" and that’ll collect all of the debug logs plus other data. (It basically collects all of the extra verbose data to a file instead of putting it out to stdout + additional data that is typically useful for diagnosis). If you want, you can submit the diagnostics log as an attachment to a github issue: Issues · spark-arena/sparkrun · GitHub and I can try to review and guide you specifically (instead of general forum chat).
I assume the issue is that you probably need to configure one or both of ssh user or cache directory as part of the cluster configuration.
sparkrun cluster update --help will give you more details on the options.
sparkrun cluster inspect <clusterName> will show you the effective configuration of a cluster.
As long as we’re effectively using the same user and cache directory, it should work out the same way, but as of now, it might not auto-determine the cache directory properly for the cluster – you might need to explicitly specify that as part of the cluster configuration. Once properly configured, it should just work from then on.
Note: you can also configure swappiness (e.g. see: SwapFaq - Community Help Wiki) and set it to 1. That should also help reduce eagerness to swap. I’ve been considering baking swappiness configuration into sparkrun but wasn’t sure if I should or not. Obviously if there is a lot of memory pressure, then swapping can occur, but the default swappiness value of 60 means that the system is much more likely to swap even without excessive pressure.
I also recommend that you give sparkrun sudo rights to clear the page cache, which can help sometimes. sparkrun setup clear-cache --save-sudo . Sometimes the page cache grows, especially on the node doing a lot of work like the head node / local sparkrun node from working on models/containers/etc. If sparkrun has permission, it’ll clear the page cache for you which can help. If you configured via the wizard, then it should’ve prompted you to do that already – but figured I’d mention it just in case.
And I guess I forgot to mention sparkrun setup fix-permissions which can specifically be used to reset the owner of cache files; however, that has been less necessary since the v0.2.x line of sparkrun that switched to not using root as the user within containers by default. Key thing to look at there is which user/UID owns the files on the NAS and compare that to the SSH user that is trying to access the cache files.