Fail to evaluate LLM efficiency using nemo evaluator

senthil.kumar.b · August 28, 2024, 10:32am

nemo evaluator service fail to evaluate LLM model meta/llama-3.1-8b-instruct

bigbench_meta-llama-3.1-8b-instruct_intent_recognition-run.log

The completion endpoint is legacy. Set use_chat_endpoint=True to use the chat completion endpoint.

evaluating nvidia-eval-tool-model…
evaluating intent_recognition for 0 shots…
Traceback (most recent call last):
File “/app/src/external/evaltool/evaltool/evaluations/llm/automatic/bigbench/bigbench/bigbench/evaluate_task.py”, line 532, in
absl.app.run(main)
File “/usr/local/lib/python3.10/site-packages/absl/app.py”, line 308, in run
_run_main(main, args)
File “/usr/local/lib/python3.10/site-packages/absl/app.py”, line 254, in _run_main
sys.exit(main(argv))
File “/app/src/external/evaltool/evaltool/evaluations/llm/automatic/bigbench/bigbench/bigbench/evaluate_task.py”, line 465, in main
results = task.evaluate_model(model, max_examples=FLAGS.max_examples, random_seed=FLAGS.random_seed)
File “/workspace/big-bench-megatron-lm/bigbench/api/json_task.py”, line 870, in evaluate_model
results = self.evaluate_fixed_shot(
File “/workspace/big-bench-megatron-lm/bigbench/api/json_task.py”, line 701, in evaluate_fixed_shot
absolute_log_probs = model.cond_log_prob(
File “/workspace/big-bench-megatron-lm/bigbench/models/query_logging_model.py”, line 123, in cond_log_prob
absolute_scores = self.model.cond_log_prob(
File “/workspace/big-bench-megatron-lm/bigbench/models/evaltool_model.py”, line 149, in cond_log_prob
batch_scores = self._model.score(
File “/app/src/external/evaltool/evaltool/models/llm/nvidia_nemo_nim_model.py”, line 381, in score
raise NotImplementedError(“Not support by NIM”)
NotImplementedError: Not support by NIM

export NGC_API_KEY=<PASTE_API_KEY_HERE>
export LOCAL_NIM_CACHE=~/.cache/nim
mkdir -p “$LOCAL_NIM_CACHE”

model deploy using nim container

Blockquote

docker run -it --rm
–gpus all
–shm-size=16GB
-e NGC_API_KEY
-v “$LOCAL_NIM_CACHE:/opt/nim/.cache”
-u $(id -u)
-p 8000:8000
nvcr.io/nim/meta/llama-3.1-8b-instruct:1.1.2

Blockquote

nemo evaluator deploying using kubernetes referring following nvidia doc
nemo evaluation

evaluation job submitted as follow

curl -X POST \
  "http://localhost:30091/v1/evaluations" \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "model": {
    "llm_name": "meta/llama-3.1-8b-instruct",
    "inference_url": "http://0.0.0.0:8000/v1",
    "use_chat_endpoint": false
  },
  "evaluations": [
    {
      "eval_type": "automatic",
      "eval_subtype": "bigbench",
      "standard_tasks": ["intent_recognition"],
      "tydiqa_tasks": [],
      "standard_tasks_args": "--max_length=64 --json_shots='0'",
      "tydiqa_tasks_args": "",
      "few_shot_example_separator_override": {
        "standard_tasks": {
          "default": null
        }
      },
      "example_input_prefix_override": {
        "standard_tasks": {
          "default": null
        }
      },
      "example_output_prefix_override": {
        "standard_tasks": {
          "default": null
        }
      },
      "stop_string_override": {
        "standard_tasks": {
          "default": null
        }
      }
    }
  ],
  "tag": "llm-experiment"
}'

Topic		Replies	Views
How to deploy the model NVIDIA NeMo llama	6	120	November 19, 2025
Aunch NVIDIA NIM (llama3-8b-instruct) for LLMs locally Access/Accounts nim , llama3-8b-instruct	3	234	November 8, 2024
NIM Llama3 8B Instruct - Running container with "CUDA_ERROR_NO_DEVICE" cuDNN docker , nim , llama3-8b-instruct	1	133	March 28, 2025
Lab: Evaluation and Light Customization of Large Language Models -> ERROR NIM Not operational Forum Feedback nim	7	175	October 22, 2025
Nemotron-3-Nano-30B-A3B-NVFP4 ultra-efficient NVFP4 precision version of Nemotron 3 Nano DGX Spark / GB10 jetson , nemotron	84	2981	March 20, 2026
Streamline Evaluation of LLMs for Accuracy with NVIDIA NeMo Evaluator Technical Blog	1	272	March 27, 2024
Nemollm-inference-microservice failed to deploy Models nim , llama3-8b-instruct , llama	1	255	October 22, 2024
Access large models (405B) with NIM after using all credits for the build.nvidia.com endpoints Access/Accounts nim , nemotron-4-340b-reward , llama-31-405b-instruct , llama	3	329	August 29, 2024
How to Deploy and Run an LLM Designed with the 'NVIDIA NeMo Framework' and 'NVIDIA Megatron' NVIDIA Nemotron nemo	3	636	February 21, 2025
LLM Performance Benchmarking: Measuring NVIDIA NIM Performance with GenAI-Perf Technical Blog nim , llama	1	146	May 6, 2025

Fail to evaluate LLM efficiency using nemo evaluator

The completion endpoint is legacy. Set use_chat_endpoint=True to use the chat completion endpoint.

Related topics