Fail to evaluate LLM efficiency using nemo evaluator

nemo evaluator service fail to evaluate LLM model meta/llama-3.1-8b-instruct

bigbench_meta-llama-3.1-8b-instruct_intent_recognition-run.log

The completion endpoint is legacy. Set use_chat_endpoint=True to use the chat completion endpoint.

evaluating nvidia-eval-tool-model…
evaluating intent_recognition for 0 shots…
Traceback (most recent call last):
File “/app/src/external/evaltool/evaltool/evaluations/llm/automatic/bigbench/bigbench/bigbench/evaluate_task.py”, line 532, in
absl.app.run(main)
File “/usr/local/lib/python3.10/site-packages/absl/app.py”, line 308, in run
_run_main(main, args)
File “/usr/local/lib/python3.10/site-packages/absl/app.py”, line 254, in _run_main
sys.exit(main(argv))
File “/app/src/external/evaltool/evaltool/evaluations/llm/automatic/bigbench/bigbench/bigbench/evaluate_task.py”, line 465, in main
results = task.evaluate_model(model, max_examples=FLAGS.max_examples, random_seed=FLAGS.random_seed)
File “/workspace/big-bench-megatron-lm/bigbench/api/json_task.py”, line 870, in evaluate_model
results = self.evaluate_fixed_shot(
File “/workspace/big-bench-megatron-lm/bigbench/api/json_task.py”, line 701, in evaluate_fixed_shot
absolute_log_probs = model.cond_log_prob(
File “/workspace/big-bench-megatron-lm/bigbench/models/query_logging_model.py”, line 123, in cond_log_prob
absolute_scores = self.model.cond_log_prob(
File “/workspace/big-bench-megatron-lm/bigbench/models/evaltool_model.py”, line 149, in cond_log_prob
batch_scores = self._model.score(
File “/app/src/external/evaltool/evaltool/models/llm/nvidia_nemo_nim_model.py”, line 381, in score
raise NotImplementedError(“Not support by NIM”)
NotImplementedError: Not support by NIM

export NGC_API_KEY=<PASTE_API_KEY_HERE>
export LOCAL_NIM_CACHE=~/.cache/nim
mkdir -p “$LOCAL_NIM_CACHE”

model deploy using nim container

Blockquote

docker run -it --rm
–gpus all
–shm-size=16GB
-e NGC_API_KEY
-v “$LOCAL_NIM_CACHE:/opt/nim/.cache”
-u $(id -u)
-p 8000:8000
nvcr.io/nim/meta/llama-3.1-8b-instruct:1.1.2

Blockquote

nemo evaluator deploying using kubernetes referring following nvidia doc
nemo evaluation

evaluation job submitted as follow

curl -X POST \
  "http://localhost:30091/v1/evaluations" \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "model": {
    "llm_name": "meta/llama-3.1-8b-instruct",
    "inference_url": "http://0.0.0.0:8000/v1",
    "use_chat_endpoint": false
  },
  "evaluations": [
    {
      "eval_type": "automatic",
      "eval_subtype": "bigbench",
      "standard_tasks": ["intent_recognition"],
      "tydiqa_tasks": [],
      "standard_tasks_args": "--max_length=64 --json_shots='0'",
      "tydiqa_tasks_args": "",
      "few_shot_example_separator_override": {
        "standard_tasks": {
          "default": null
        }
      },
      "example_input_prefix_override": {
        "standard_tasks": {
          "default": null
        }
      },
      "example_output_prefix_override": {
        "standard_tasks": {
          "default": null
        }
      },
      "stop_string_override": {
        "standard_tasks": {
          "default": null
        }
      }
    }
  ],
  "tag": "llm-experiment"
}'