Incomplete Response from NVIDIA API when 'max_tokens' Parameter is Unspecified


I’ve encountered an issue with the NVIDIA API for generating conversations. It seems to return an incomplete message when ‘max_tokens’ is not specified in the request.

A screenshot demonstrating the problem, the first response is incomplete:

The following code reproduces the issue. Note that the ‘max_tokens’ parameter has been intentionally left out:

curl \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta/llama3-70b-instruct",
    "messages": [{"role":"user","content":"longtime no see"}],
    "temperature": 0.5,   
    "top_p": 1,
    "stream": false                

Without specifying ‘max_tokens’, I would expect the API to provide a default length for the message. However, the response seems to be unexpectedly truncated or incomplete.

Thank you for your reading.