NIM llama3 deploy fp16

Hi

Regarding NIM_MODEL_PROFILE, I am trying to find a way to select fp16 with reference to the following link.
Optimization - NVIDIA Docs

The following options were also shown in the list of available options
“tensorrt_llm-h100-fp16-tp4-throughput”
But this option is not available.
The NIM container tag is 1.0.1.

When I tried llama3-8B in the same way, I was able to select fp16.
Are my variables wrong?
Or is fp16 still not an option for llama3-70b?

Please let me know if there is a way to select fp16 in llama3-70B
Best regards

@kawabe-k you can find out what profiles are available by running

 docker run --gpus all nvcr.io/nim/meta/llama3-70b-instruct:1.0.0 list-model-profiles

In this case it looks like we do have two fp16 profiles available:

  - abcff5042bfc3fa9f4d1e715b2e016c11c6735855edfe2093e9c24e83733788e (tensorrt_llm-h100-fp16-tp4-throughput)
  - a90b2c0217492b1020bead4e7453199c9f965ea53e9506c007f4ff7b58ee01ff (tensorrt_llm-h100-fp16-tp8-latency)

Can you share the command you are using and the error message you are seeing?

Thank you for your information.

We are using Run:ai and do not know if the cause is on the NIM side or the Run:ai side.
There do not appear to be any errors in the log.
nim-fp-16.txt (6.0 KB)
nim-fp8.txt (6.4 KB)

Just to be sure,
Are you sure you can deploy on llama3-70b with fp16 in an NVIDIA environment?

Best regards