Originally published at: Announcing ComputeEval, an Open-Source Framework for Evaluating LLMs on CUDA | NVIDIA Technical Blog
Large language models (LLMs) are revolutionizing how developers code and how they learn to code. For seasoned or junior developers alike, today’s state-of-the-art models can generate Python scripts, React-based websites, and more. In the future, powerful AI models will assist developers in writing high-performance GPU code. This raises an important question: How can it be…
When using custom models, why does it generates 500 requests from the api even when i have set the problem file to example_test.jsonl which only has 4 problems.
I have set the number of sample generation to 1 as well, what i have understood from the whole code, it should generate 4 requests only.
Edit: If you dont mention the num_of_samples in yaml, its 100 by default.
This is a fantastic announcement — thanks for sharing! ComputeEval is a powerful step toward standardizing LLM benchmarking on CUDA platforms.
A couple thoughts and questions:
-
It’d be great to see more examples of custom model integrations (especially with different architectures) to test compatibility.
-
Also curious: how does it handle tokenization and batching overhead across different models / GPUs?