nemo evaluator service fail to evaluate LLM model meta/llama-3.1-8b-instruct
bigbench_meta-llama-3.1-8b-instruct_intent_recognition-run.log
The completion endpoint is legacy. Set use_chat_endpoint=True to use the chat completion endpoint.
evaluating nvidia-eval-tool-model…
evaluating intent_recognition for 0 shots…
Traceback (most recent call last):
File “/app/src/external/evaltool/evaltool/evaluations/llm/automatic/bigbench/bigbench/bigbench/evaluate_task.py”, line 532, in
absl.app.run(main)
File “/usr/local/lib/python3.10/site-packages/absl/app.py”, line 308, in run
_run_main(main, args)
File “/usr/local/lib/python3.10/site-packages/absl/app.py”, line 254, in _run_main
sys.exit(main(argv))
File “/app/src/external/evaltool/evaltool/evaluations/llm/automatic/bigbench/bigbench/bigbench/evaluate_task.py”, line 465, in main
results = task.evaluate_model(model, max_examples=FLAGS.max_examples, random_seed=FLAGS.random_seed)
File “/workspace/big-bench-megatron-lm/bigbench/api/json_task.py”, line 870, in evaluate_model
results = self.evaluate_fixed_shot(
File “/workspace/big-bench-megatron-lm/bigbench/api/json_task.py”, line 701, in evaluate_fixed_shot
absolute_log_probs = model.cond_log_prob(
File “/workspace/big-bench-megatron-lm/bigbench/models/query_logging_model.py”, line 123, in cond_log_prob
absolute_scores = self.model.cond_log_prob(
File “/workspace/big-bench-megatron-lm/bigbench/models/evaltool_model.py”, line 149, in cond_log_prob
batch_scores = self._model.score(
File “/app/src/external/evaltool/evaltool/models/llm/nvidia_nemo_nim_model.py”, line 381, in score
raise NotImplementedError(“Not support by NIM”)
NotImplementedError: Not support by NIM
export NGC_API_KEY=<PASTE_API_KEY_HERE>
export LOCAL_NIM_CACHE=~/.cache/nim
mkdir -p “$LOCAL_NIM_CACHE”
model deploy using nim container
Blockquote
docker run -it --rm
–gpus all
–shm-size=16GB
-e NGC_API_KEY
-v “$LOCAL_NIM_CACHE:/opt/nim/.cache”
-u $(id -u)
-p 8000:8000
nvcr.io/nim/meta/llama-3.1-8b-instruct:1.1.2
Blockquote
nemo evaluator deploying using kubernetes referring following nvidia doc
nemo evaluation
evaluation job submitted as follow
curl -X POST \
"http://localhost:30091/v1/evaluations" \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"model": {
"llm_name": "meta/llama-3.1-8b-instruct",
"inference_url": "http://0.0.0.0:8000/v1",
"use_chat_endpoint": false
},
"evaluations": [
{
"eval_type": "automatic",
"eval_subtype": "bigbench",
"standard_tasks": ["intent_recognition"],
"tydiqa_tasks": [],
"standard_tasks_args": "--max_length=64 --json_shots='0'",
"tydiqa_tasks_args": "",
"few_shot_example_separator_override": {
"standard_tasks": {
"default": null
}
},
"example_input_prefix_override": {
"standard_tasks": {
"default": null
}
},
"example_output_prefix_override": {
"standard_tasks": {
"default": null
}
},
"stop_string_override": {
"standard_tasks": {
"default": null
}
}
}
],
"tag": "llm-experiment"
}'