Incomplete Response from NVIDIA API when 'max_tokens' Parameter is Unspecified

hello,

I’ve encountered an issue with the NVIDIA API for generating conversations. It seems to return an incomplete message when ‘max_tokens’ is not specified in the request.

A screenshot demonstrating the problem, the first response is incomplete:

The following code reproduces the issue. Note that the ‘max_tokens’ parameter has been intentionally left out:

curl https://integrate.api.nvidia.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY_REQUIRED_IF_EXECUTING_OUTSIDE_NGC" \
  -d '{
    "model": "meta/llama3-70b-instruct",
    "messages": [{"role":"user","content":"longtime no see"}],
    "temperature": 0.5,   
    "top_p": 1,
    "stream": false                
  }'

Without specifying ‘max_tokens’, I would expect the API to provide a default length for the message. However, the response seems to be unexpectedly truncated or incomplete.

Thank you for your reading.