Discrepancy in Maximum Token Length for nv-embed-qa-1b-v2 Model

jhpark26 · February 6, 2025, 8:03am

Hi NVIDIA team,

I’ve encountered an issue regarding the maximum input token length while working with the nv-embed-qa-1b-v2 model for text embeddings.

According to the model card (llama-3.2-nv-embedqa-1b-v2 Model by NVIDIA | NVIDIA NIM), the NVIDIA NeMo Retriever Llama3.2 embedding model should support “long documents (up to 8192 tokens)” with dynamic embedding size through Matryoshka Embeddings.

However, when attempting to process text with 597 tokens, I received the following error from the tritonserver:

openai.BadRequestError: Error code: 400 - {'object': 'error', 'message': 'Input length 597 exceeds maximum allowed token size 512', 'detail': {}, 'type': 'invalid_request_error'}

This error message indicates a maximum token limit of 512, which contradicts the documented 8192 token limit. Could you please:

Clarify the actual maximum token limit for this model
Explain if there are any specific configuration settings needed to utilize the full 8192 token capacity
Provide guidance on handling longer documents if the 512 token limit is indeed correct

Environment details:

Model: nv-embed-qa-1b-v2
Deployment: Triton Server
Input: Text document (597 tokens)

Thank you for your assistance in resolving this discrepancy.

sophwats · February 6, 2025, 11:01am

Thanks for bringing this to our attention. We have recreated the error and confirm that the wrong tokenizer version was deployed by us. We are working to fix this and I will get back to you as soon as i have an eta on a fix, or we have resolved the problem. Thanks so much for your patience, and for raising this in the forum! Best, Sophie.

jhpark26 · February 7, 2025, 12:52am

Thank you for the quick response and confirmation of the tokenizer issue!

While we await the fix, I’d like to raise a related query about another model we’re using: ngc:nim/nvidia/vila-1.5-40b:vila-yi-34b-siglip-stage3_1003_video_v8, which is used in VSS(NVIDIA’s video search and summarization agent). This is particularly relevant because VSS is using nv-embed-qa-1b-v2 to generate embeddings for the text summaries produced by VILA 1.5.

Currently, we haven’t encountered any token limit by errors with nv-embed-qa when processing video chunk summaries, as our summaries have remained under 512 tokens. However, I’d appreciate clarification on whether:

This 512 token limit for VILA-1.5 is the intended design specification
The VILA-1.5 is supposed to generate longer text outputs.

This information would help us better plan our implementation and avoid potential issues in the future.

Thank you for your continued support!

sophwats · February 7, 2025, 10:49am

@jhpark26 I’m reaching out to the team who work on VILA to get you an answer to this ASAP! Thanks for your patience.

Topic		Replies	Views
API Input length 1217 exceeds maximum allowed token size 512 but configured the API parameters to 4096 AI Foundation Models and Endpoints llama	0	64	November 27, 2024
OpenAI Compatible API does not work Models llama-31-8b-instruct , llama-31-70b-instruct	6	368	August 26, 2024
Riva Build fails for finetuned conformer NeMo models with batch size 1 Riva	2	750	November 1, 2022
Invalid response from Embedding models Models embed-qa-4 , nv-embed-v1	3	26	April 17, 2025
Override max_num_seqs on nvcr.io/nim/meta/llama-3.2-11b-vision-instruct Models nim , llama	4	115	February 12, 2025
Assistance Required for API Call Error: Prompt Length Exceeds Maximum Input Length in TRTGptModel Models nim , mistral-7b-instruct-v03	0	77	December 20, 2024
Problem with installation of Llama 3.1 8b NIM Models nim , llama3-8b-instruct , llama-31-8b-instruct , llama	1	545	September 4, 2024
Deploying VSS blueprint using kubernetes Visual AI Agent	3	121	February 14, 2025
Error deploying VSS blueprint Visual AI Agent nim , llama	3	54	March 10, 2025
Reusing a stored model (llama-3.1-8b-instruct) with a proper profile Models nim , llama-31-8b-instruct , llama	0	150	October 30, 2024

Discrepancy in Maximum Token Length for nv-embed-qa-1b-v2 Model

Related topics