Issue with genai-perf for muliti-lora on NIM

Hi
I’m following this doc to setup multi-lora on AWS EKS and it’s working. Now I’d like to evaluate its performance based on this doc

The warm-up evaluation goes well, but I encounter an issue in the phase of “sweeping through all the use cases”.
Although genai-perf complete the evaluation with various use cases and concurrency, there are missing files of “input_output_genai_perf.csv” in each folder.

Here is my first level file structure:
image

In each folder, I got the same file structure as follows:
image

here is the script I used to run benchmark

declare -A useCases

# Populate the array with use case descriptions and their specified input/output lengths
useCases["Translation"]="200/200"
useCases["Text classification"]="200/5"
useCases["Text summary"]="1000/200"

# Function to execute genAI-perf with the input/output lengths as arguments
runBenchmark() {
    local description="$1"
    local lengths="${useCases[$description]}"
    IFS='/' read -r inputLength outputLength <<< "$lengths"

    echo "Running genAI-perf for $description with input length $inputLength and output length $outputLength"
    #Runs
    for concurrency in 1 2 5 10 50 100 250; do

        local INPUT_SEQUENCE_LENGTH=$inputLength
        local INPUT_SEQUENCE_STD=0
        local OUTPUT_SEQUENCE_LENGTH=$outputLength
        local CONCURRENCY=$concurrency
        local MODEL=llama3-8b-instruct-lora_vhf-squad-v1

        genai-perf \
            -m $MODEL \
            --endpoint-type chat \
            --service-kind openai \
            --streaming \
            -u 10.100.59.31:8000 \
            --synthetic-input-tokens-mean $INPUT_SEQUENCE_LENGTH \
            --synthetic-input-tokens-stddev $INPUT_SEQUENCE_STD \
            --concurrency $CONCURRENCY \
            --output-tokens-mean $OUTPUT_SEQUENCE_LENGTH \
            --extra-inputs max_tokens:$OUTPUT_SEQUENCE_LENGTH \
            --extra-inputs min_tokens:$OUTPUT_SEQUENCE_LENGTH \
            --extra-inputs ignore_eos:true \
            --tokenizer meta-llama/Meta-Llama-3-8B-Instruct \
            --measurement-interval 10000 \
            --profile-export-file ${INPUT_SEQUENCE_LENGTH}_${OUTPUT_SEQUENCE_LENGTH}.json \
            -- \
            -v \
            --max-threads=256

    done
}

# Iterate over all defined use cases and run the benchmark script for each
for description in "${!useCases[@]}"; do
    runBenchmark "$description"
done

I only have one csv file called profile_export_genai_perf.csv, but I expect to have multiple csf files for each use cases, which should be something like this:
image

I intend to use these csv files to conduct basic data analysis to generate plots, but the missing csv files in each folder block the way, is there anyone who could help provide some tips to better address this issue?

Hi Ryan,

Which genai-perf container were you using? If newer than the one in the guide (24.06), then there could have been some changes.

Either way, could you check if the {ISL}_{OSL}.json file contains the performance metrics for each use case? (in the respective files in the JSON object).


I have just tested with the genAI-perf 24.06 container. I’m seeing the separate .csv files ( ISL_OSL_genai_perf.csv) for results as expected.

Another hypothesis to try: increase the measurement interval

--measurement-interval 10000

(default 10000ms, or 10s).

If the inference server is slow or the ISL/OSL are large, increase this value so that at least a few requests are complete.