OpenAI Compatible API does not work

soonh.yoon · August 20, 2024, 9:30am

v1/completion works normally
v1/chat/completions results in 500 errors.
I need to use /v1/chat/completion for the function call test.

InternalServerError: Error code: 500 - {‘object’: ‘error’, ‘message’: "init(): incompatible constructor arguments. The following argument types are supported:\n 1. tensorrt_llm.bindings.executor.Request(input_token_ids: list[int], max_new_tokens: int, streaming: bool = False, sampling_config: tensorrt_llm.bindings.executor.SamplingConfig = SamplingConfig(), output_config: tensorrt_llm.bindings.executor.OutputConfig = OutputConfig(), end_id: Optional[int] = None, pad_id: Optional[int] = None, bad_words: Optional[list[list[int]]] = None, stop_words: Optional[list[list[int]]] = None, embedding_bias: Optional[torch.Tensor] = None, external_draft_tokens_config: Optional[tensorrt_llm.bindings.executor.ExternalDraftTokensConfig] = None, prompt_tuning_config:

calexiuk · August 20, 2024, 2:41pm

Can you confirm which NIM you are using in this example?

Thanks!

soonh.yoon · August 21, 2024, 12:31am

NIM_MODEL_PROFILE : “tensorrt_llm-h100-bf16-tp1-throughput”
Images : nvcr.io/nim/meta/llama-3.1-8b-instruct:latest or nvcr.io/nim/meta/llama-3.1-70instructb-:latest

I just used the NVIDIA Docs Hub Function Calling Guide (Function Calling - NVIDIA Docs)

neal.vaidya · August 21, 2024, 2:00am

Hi @soonh.yoon – due to a bug, the max_token parameter is required for completion and chat_completion API calls with the latest llama 3.1 models. We’ll address this in a future release but for now please ensure that the max_token parameter is set

soonh.yoon · August 26, 2024, 11:35am

I think it’s a problem because openai api doesn’t have a parameter called max_new_tokens?
InternalServerError: Error code: 500 - {‘object’: ‘error’, ‘message’: “init(): incompatible constructor arguments. The following argument types are supported:\n 1. tensorrt_llm.bindings.executor.Request(input_token_ids: list[int], max_new_tokens: int, streaming: bool = False, sampling_config: tensorrt_llm.bindings.executor.SamplingConfig = SamplingConfig(), output_config: tensorrt_llm.bindings.executor.OutputConfig = OutputConfig(), end_id: Optional[int] = None, pad_id: Optional[int] = None, bad_words: Optional[list[list[int]]] = None, stop_words: Optional[list[list[int]]] = None, embedding_bias: Optional[torch.Tensor] = None, external_draft_tokens_config: … , max_new_tokens=None, streaming=True, output_config=<tensorrt_llm.bindings.executor.OutputConfig object at 0x7f41bc987670>, sampling_config=<tensorrt_llm.bindings.executor.SamplingConfig object at 0x7f45d5b32370>, end_id=128009, lora_config=None, logits_post_processor_name=‘batched’”, ‘type’: ‘InternalServerError’, ‘param’: None, ‘code’: 500}

soonh.yoon · August 26, 2024, 11:52am

After modifying the openai api source code to put max_new_tokens, BadRequestError: Error Code: 400 - {‘object’, ‘error’, ‘message’: ‘{‘type’: ‘extra_forbidden’, ‘loc’: (‘body’, ‘max_new_tokens’) ‘msg’: ‘no additional input allowed’, ‘input’: 1024}’, ‘Type’: ‘BadRequestError’, ‘param’: none, ‘code’: 400}

neal.vaidya · August 26, 2024, 5:04pm

Hi @soonh.yoon – in the HTTP API this parameter is called max_tokens

Topic		Replies	Views
Nvidia / llama-3.1-nemotron-70b-instruct openai api is not working TensorRT llama	1	292	November 10, 2024
Result of nvidia nims in openai SDK and API inconsistent NVIDIA Nemotron nim , llama-31-405b-instruct , llama	0	34	January 7, 2025
NVIDIA NIM API invoked by Langchain returns statuscode 500 Access/Accounts nim , llama-31-70b-instruct , llama	1	201	September 4, 2024
NVIDIA NIM API / openai.API: Error code: 402,Cloud credits expired - Please contact NVIDIA representatives Models nim , llama-31-405b-instruct , llama	8	473	January 19, 2025
Supercharging Llama 3.1 across NVIDIA Platforms Technical Blog	14	209	September 17, 2024
The model llama3 does not exist calling from ChatNVIDIA langchain class NVIDIA Nemotron	2	569	May 6, 2024
Model _ request Model Does not exist error NIM on RTX AI PCs and Workstations nim , llama-31-8b-instruct , llama	0	35	May 31, 2025
"404 Page Not Found" Error When api used as openai Models	2	299	January 25, 2025
Not getting response for this model since yesterday meta/llama-3.1-405b-instruct model NVIDIA Nemotron llama-31-405b-instruct , llama	0	78	December 13, 2024
Open AI Endpoint NVIDIA Nemotron	0	222	April 28, 2024

OpenAI Compatible API does not work

Related topics