Issue with genai-perf for muliti-lora on NIM

ryan_sg · August 28, 2024, 3:03am

Hi
I’m following this doc to setup multi-lora on AWS EKS and it’s working. Now I’d like to evaluate its performance based on this doc

The warm-up evaluation goes well, but I encounter an issue in the phase of “sweeping through all the use cases”.
Although genai-perf complete the evaluation with various use cases and concurrency, there are missing files of “input_output_genai_perf.csv” in each folder.

Here is my first level file structure:

In each folder, I got the same file structure as follows:

here is the script I used to run benchmark

declare -A useCases

# Populate the array with use case descriptions and their specified input/output lengths
useCases["Translation"]="200/200"
useCases["Text classification"]="200/5"
useCases["Text summary"]="1000/200"

# Function to execute genAI-perf with the input/output lengths as arguments
runBenchmark() {
    local description="$1"
    local lengths="${useCases[$description]}"
    IFS='/' read -r inputLength outputLength <<< "$lengths"

    echo "Running genAI-perf for $description with input length $inputLength and output length $outputLength"
    #Runs
    for concurrency in 1 2 5 10 50 100 250; do

        local INPUT_SEQUENCE_LENGTH=$inputLength
        local INPUT_SEQUENCE_STD=0
        local OUTPUT_SEQUENCE_LENGTH=$outputLength
        local CONCURRENCY=$concurrency
        local MODEL=llama3-8b-instruct-lora_vhf-squad-v1

        genai-perf \
            -m $MODEL \
            --endpoint-type chat \
            --service-kind openai \
            --streaming \
            -u 10.100.59.31:8000 \
            --synthetic-input-tokens-mean $INPUT_SEQUENCE_LENGTH \
            --synthetic-input-tokens-stddev $INPUT_SEQUENCE_STD \
            --concurrency $CONCURRENCY \
            --output-tokens-mean $OUTPUT_SEQUENCE_LENGTH \
            --extra-inputs max_tokens:$OUTPUT_SEQUENCE_LENGTH \
            --extra-inputs min_tokens:$OUTPUT_SEQUENCE_LENGTH \
            --extra-inputs ignore_eos:true \
            --tokenizer meta-llama/Meta-Llama-3-8B-Instruct \
            --measurement-interval 10000 \
            --profile-export-file ${INPUT_SEQUENCE_LENGTH}_${OUTPUT_SEQUENCE_LENGTH}.json \
            -- \
            -v \
            --max-threads=256

    done
}

# Iterate over all defined use cases and run the benchmark script for each
for description in "${!useCases[@]}"; do
    runBenchmark "$description"
done

I only have one csv file called profile_export_genai_perf.csv, but I expect to have multiple csf files for each use cases, which should be something like this:

I intend to use these csv files to conduct basic data analysis to generate plots, but the missing csv files in each folder block the way, is there anyone who could help provide some tips to better address this issue?

vinhn · September 2, 2024, 12:29am

Hi Ryan,

Which genai-perf container were you using? If newer than the one in the guide (24.06), then there could have been some changes.

Either way, could you check if the {ISL}_{OSL}.json file contains the performance metrics for each use case? (in the respective files in the JSON object).

vinhn · September 2, 2024, 2:06am

I have just tested with the genAI-perf 24.06 container. I’m seeing the separate .csv files ( ISL_OSL_genai_perf.csv) for results as expected.

vinhn · September 3, 2024, 5:02am

Another hypothesis to try: increase the measurement interval

--measurement-interval 10000

(default 10000ms, or 10s).

If the inference server is slow or the ISL/OSL are large, increase this value so that at least a few requests are complete.

Topic		Replies	Views
LLM Performance Benchmarking: Measuring NVIDIA NIM Performance with GenAI-Perf Technical Blog nim , llama	1	119	May 6, 2025
Using genai_perf for multilingual data Models llama	1	46	November 14, 2025
Genai-perf high concurrency errors Models nim	1	177	May 1, 2025
Measuring Generative AI Model Performance Using NVIDIA GenAI-Perf and an OpenAI-Compatible API Technical Blog	1	132	August 1, 2024
New tool: llama-benchy - llama-bench style benchmarking for ANY LLM backend (vLLM, SGLang, llama.cpp, etc.) DGX Spark / GB10 Projects llama	7	451	February 13, 2026
GenaiPerf benchmark Models	4	174	June 24, 2025
NVIDIA Blackwell Platform Sets New LLM Inference Records in MLPerf Inference v4.1 Technical Blog	2	113	August 28, 2024
NVIDIA Sets New Generative AI Performance and Scale Records in MLPerf Training v4.0 Technical Blog	1	164	June 12, 2024
NVIDIA Blackwell Delivers Massive Performance Leaps in MLPerf Inference v5.0 Technical Blog	1	64	April 2, 2025
LLM Benchmarking: Fundamental Concepts Technical Blog	1	107	April 2, 2025

Issue with genai-perf for muliti-lora on NIM

Related topics