Hi, we bought 2x GH200 for on prem LLM inference, for good MLPerf score on Llama-2-70B and more memory than H100 SXM. But I realize it’s not in NIM support matrix as of now.
Will NIM support optimized tensorrt-llm profile for GH200 leveraging the coherent 576GB memory?
Will NIM have minimum requirement of tensor parallelism even using GH200? For example, currently llama-3.1-70B NIM require tp4 but 1xGH200 should be able to host it.
This is something we’re currently evaluating!
Thanks for the feedback, @ziling!
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.