How to run these sample multi-gpu programs

Each node will certainly require a proper GPU driver install, that is unavoidable if you want to use the GPUs on each node.

Certainly installing the full CUDA toolkit (on each node) would be one way to solve this problem.

Otherwise you would need to get the libcudart available on each node, somehow. You could just copy it there, and make sure it is in the path. You could also set up e.g. a NFS share, that has everything needed to run the code. Make the NFS share available on each node in your cluster. The libcudart from your build on the CUDA 11.1 machine is usable on the CUDA 12 machine, but not vice-versa. So build the code on the lower CUDA toolkit version.

Running/managing clusters (e.g. setting up a NFS share) is documented in many places on the web, and not something I would try to respond to on this forum.

I think the usual advice for any beowulf-style cluster is that the cluster administrator should make sure the software install on each node is identical. So having some nodes with CUDA 11 and some nodes with CUDA 12 is going to lead to problems like this. Even if you sort this one out, say, by distributing the needed CUDA 11 libcudart manually, someday you might try to run a code that requires some other library, such as a cufft code, for example, and you will be revisiting this. So as a matter of sanity, or efficient use of your time, you might want to make sure all your nodes have the same software install.