Deployment Setup:
- Successfully deployed VSS using docker compose with local LLM deployment (40B VLM)
- Both VSS and VILA API are reportedly backed by the same model according to NVIDIA team
Testing Methodology:
- Testing damage detection on bag images using VSS UI
- Testing the same image dataset using VILA API with identical prompt
- Prompt used: “The given input images are different angles of a single bag. Examine the images and determine if the bag is damaged or undamaged.”
- Testing involves multiple angles of single bag images for damage assessment
- Implemented methodology of splitting images for minute damage detection
Results Analysis:
- Total test cases: 40 bag damage detection scenarios
- Matching results: 35 out of 40 test cases show consistent responses between VSS and VILA API
- Inconsistent results: 5 out of 40 test cases show different responses
- VILA API responses were absolutely correct in all test cases
- VSS showed incorrect damage detection in the 5 inconsistent cases
Core Issue:
Despite using the same underlying model, there is a deviation in damage detection accuracy between VSS and VILA API implementations.
Question:
Why is this deviation happening when both systems are supposedly backed by the same model? What factors could cause VSS to produce different (incorrect) results compared to VILA API for the same input images and prompts?
sample Image data:
sampleImage.zip (583.8 KB)
This may be caused by parameter differences. Can you use via_client_cli.py and add the --print-curl-command parameter to reproduce and obtain parameter information?
VILA API parameters can also be provided
Using curl command can help us inspect the problem.
Other tips:
- Check the output of vlm_pipeline and compare it with the VILA API.
vss-engine/src/vlm_pipeline/vlm_pipeline.py
vlm_response_stats = ctx.ask(
request_params[0].vlm_prompt,
generation_config=request_params[0].vlm_generation_config,
chunk=chunk,
)
- If the above output is consistent with the VILA API, please check RAG. You can also use
via_client_cli.py and set enable_chat to false to turn off RAG. check this code snippet in via_stream_handler.py
if req_info.summarize:
if req_info.enable_chat:
with TimeMeasure("Context Manager Summarize/summarize"):
agg_response = req_info._ctx_mgr.call(
{
"summarization": {
"start_index": (
2 * chunk_responses[0].chunk.chunkIdx
if req_info.enable_audio
else chunk_responses[0].chunk.chunkIdx
),
"end_index": (
2 * chunk_responses[-1].chunk.chunkIdx + 1
if req_info.enable_audio
else chunk_responses[-1].chunk.chunkIdx
),
},
"chat": {"post_process": True},
}
)
else:
Are you running Nvlia 40B locally? Please ensure that the --model-temperature/--model-top-p/--model-top-k parameters are consistent; these parameters can also cause differences in output.
Yes, I’m running the model locally. I’ve ensured the parameters are consistent with the VILA API settings:
–model-temperature
–model-top-p
–model-top-k
Results after parameter adjustment:
Partial improvement: Some previously failed responses now match the expected output
Remaining issues: Several responses still show discrepancies despite matching parameters
This suggests that while parameter consistency helps, there may be additional factors affecting the output differences between local and API versions.
There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks.