Thanks for the clarification! Must have read a previous post wrong about it fitting on a single node :)P
I was able to get the server to load the model (INT4 variant) . Upon execution however it failed with the following error:
(EngineCore pid=204) ERROR 03-24 19:30:50 [core.py:1110] raise RuntimeError("Kernel requires a runtime memory allocation, but no allocator was set. " +
(EngineCore pid=204) ERROR 03-24 19:30:50 [core.py:1110] RuntimeError: Kernel requires a runtime memory allocation, but no allocator was set. Use triton.set_allocator to specify an allocator.
(EngineCore pid=204) INFO 03-24 19:30:50 [ray_executor.py:119] Shutting down Ray distributed executor. If you see error log from logging.cc regarding SIGTERM received, please ignore because this is the expected termination process in Ray.