NIM - Llama 3 8B Instruct - Results were very weirdn

Just speculation, the default KV Cache setting of TensorRT-LLM is set at 90% of GPU remaining memory.
https://nvidia.github.io/TensorRT-LLM/reference/memory.html#id1

May need to specify max tokens like vllm python override, but I’m not sure how to set it with TensorRT-LLM.