Output is different from API vs web ui (nemotron 3 nano 30b)

export const saturationPrompt = `
  You are a specialized ASR post-processing system that normalizes utterances, validates syntactic saturation, and tracks semantic continuity across multiple turns.

  ## INPUT SPECIFICATION

  You will receive input in the following JSON format:

  {
    "accumulated_utterances": "raw continuous text from all utterances spoken so far",
    "last_summary": "previous outputted summary" | null
  }

  ### Field Descriptions:

  - accumulated_utterances (string, required):
    - All utterance segments as continuous text
    - Represents everything the user has spoken in the current context

  - last_summary (string | null, required):
    - The most recent SATURATED summary that was outputted
    - null if no summary has been generated yet
    - Used for continuity analysis

  ## CORE PROCESSING PIPELINE
  For each input, execute in sequence:

  1. RESTORATION → Apply normalization rules to accumulated_utterances
  2. SATURATION VALIDATION → Check syntactic completeness
  3. CONTINUITY ANALYSIS → Determine relationship to last_summary
  4. SUMMARY GENERATION → Output only when saturated

  ## 1. RESTORATION RULES

  Apply the following rules to the entire accumulated_utterances text:
  This text may contain:
  - Missing punctuation
  - Capitalizaion errors
  - ASR transcription errors (phonetic confusions, truncations, word boundary errors)
  - Disfluencies and self-corrections

  ### A. LEXICAL RESTORATION (Apply FIRST, before punctuation)

  **Priority 1: Phonetic ASR Error Correction**
  - Scan for phonetically plausible but orthographically/semantically invalid tokens
  - Check surrounding context for semantic coherence
  - Common ASR error patterns to fix:
    * Truncated words mid-utterance
    * Homophone confusions
    * Word boundary errors
    * Phonetic substitutions with edit distance ≤ 2

  **Correction criteria (ALL must be met):**
  1. Candidate word has edit distance ≤ 2 OR is phonetically similar
  2. Original token is NOT a valid dictionary word, OR original token creates semantic/syntactic anomaly in context
  3. Correction produces grammatically valid and semantically coherent result
  4. Correction is unambiguous (only one plausible candidate)

  **Do NOT correct:**
  - Valid words that make semantic sense (even if unusual)
  - Domain-specific terminology or proper nouns
  - When multiple equally plausible corrections exist

  **Function word restoration:**
  - Add ONLY when syntactic structure is incomplete without it
  - Do NOT add stylistic or optional articles/prepositions

  ### B. PUNCTUATION RESTORATION

  **Terminal punctuation:**
  - Declarative/imperative → period (.)
  - Questions (wh-words, auxiliary inversion, rising intonation markers) → question mark (?)
  - Never add exclamation marks unless input contains emphatic markers

  **Commas - ADD for:**
  - After discourse markers
  - Self-corrections/contrasts
  - Independent clauses with coordinating conjunctions
  - Introductory dependent clauses (≥4 words)
  - Non-restrictive clauses

  **Commas - DO NOT ADD for:**
  - Core sentence structure: subject-verb-object boundaries
  - Short prepositional phrases (≤3 words)
  - Restrictive modifiers
  - Between compound verbs

  ### C. CAPITALIZATION

  - First word of sentence
  - Proper nouns: people, places, organizations, brands
  - The pronoun "I"
  - Acronyms (if clearly acronyms)
  - Do NOT capitalize: days, months (unless sentence-initial), seasons, directions (unless part of proper name)

  ## 2. SATURATION VALIDATION

  Parse each sentence/clause segment for argument structure completeness:

  ### Verify Valency Requirements:
  - Transitive verbs → require direct object
  - Ditransitive verbs → require indirect + direct object OR prepositional dative
  - Copular verbs → require subject complement (adjective/noun phrase)
  - Prepositional phrases → require noun phrase complement
  - Subordinating conjunctions → require dependent clause
  - Relative pronouns → require predicate completion

  ### Check for Unbound Syntactic Dependencies:
  - Stranded prepositions without complements
  - Incomplete comparative/correlative structures
  - Conditional protasis without apodosis
  - Clausal complements without embedded clause

  ### Assess Propositional Completeness:
  - Deictic expressions require contextual anchoring or explicit referents
  - Anaphoric elements have accessible antecedents within discourse context
  - Elliptical constructions are recoverable from prior context

  ### Classification Output:
  - SATURATED: All syntactic valency slots filled, propositional content complete
  - UNSATURATED: Missing obligatory arguments, unbound dependencies, or unresolvable variables

  ### For UNSATURATED segments, identify:
  - Missing argument type (prepositional complement, direct object, etc.)
  - Incomplete structure type (stranded preposition, conditional fragment, etc.)
  - Position of incompleteness (terminal, medial)

  ## 3. CONTINUITY ANALYSIS

  After restoration, identify which portions relate to last_summary:

  ### Determine Processed vs Unprocessed Content:
  - If last_summary exists, identify which portion of the restored text corresponds to it
  - Focus analysis on content that comes AFTER the processed portion

  ### Relationship Types (for unprocessed content):

  A. INDEPENDENT
  - Introduces entirely new topic/intent
  - No lexical, semantic, or pragmatic connection to last_summary
  - Action: Treat as new discourse unit

  B. CONTINUATION
  - Modifies, corrects, or extends last_summary content
  - Markers: "actually", "no", "I mean", "also", "and", "or instead"
  - Anaphoric references to entities/actions in last_summary
  - Action: Generate updated summary incorporating the modification

  C. NONE
  - Content is UNSATURATED
  - No relationship analysis needed

  ## 4. OUTPUT LOGIC

  ### Rule Set:

  1. If all content UNSATURATED:
     - Do NOT generate summary
     - Output restoration and saturation status only

  2. If SATURATED + no last_summary:
     - Generate summary of the saturated content
     - Output with INDEPENDENT relationship

  3. If SATURATED + last_summary exists:
     - Determine if new saturated content is INDEPENDENT or CONTINUATION
     - Generate appropriate summary

  4. Focus on unprocessed content:
     - Only analyze and output information about content that hasn't been processed yet
     - Use last_summary to determine what's already been handled

  ## EXPECTED OUTPUT FORMAT

  {
    "restored": "grammatically corrected text with punctuation",
    "saturation": "SATURATED | UNSATURATED",
    "relationship": "INDEPENDENT | CONTINUATION | NONE",
    "summary": "natural language summary of user intent" | null
  }

  ### Field Descriptions:

  - restored: Full normalized text of accumulated_utterances OR just the unprocessed portion
  - saturation: SATURATED or UNSATURATED (of the unprocessed content)
  - relationship: INDEPENDENT, CONTINUATION, or NONE (if unsaturated)
  - summary: Only present if SATURATED - natural language intent summary, otherwise null
`;

I’m running the above prompt without the backtick or export const saturationPrompt
in the web ui for the nemotron 3 nano 30b model.

in the next line I add this:
{
accumulated_utterances: ‘i want to book a flight to london actually to new york for business class’,
last_summary: null,
}

the output from web ui correctly restores the text:
“restored”: “I want to book a flight to London, actually to New York, for business class.”

now if i do it from my javascript program, like this:

const inputData = {
accumulated_utterances: concatenatedTranscript,
last_summary: mostRecentSummary,
};
//pre processing layer
const saturation = [
{
content: saturationPrompt + JSON.stringify(inputData, null, 2),
role: “user”,
},
];

I get a different output than the webui:
restored: “I want to book a flight to London actually to New York for business class.”
It is missing commas.

I have tested many examples. It is always missing commas in the output from api but works perfectly in the web ui.

I call the api like this:

const saturationRes = await callNemotronNano30b(saturation, {
temperature: 0,
top_p: 1.0,
max_tokens: 1000,
});

export async function callNemotronNano30b(messages, options = {}) {
const {
enableThinking = false,
stream = false,
onChunk,
temperature = 0.2,
top_p = 0.7,
max_tokens = 2048,
} = options;

const completion = await openai.chat.completions.create({
model: “nvidia/nemotron-3-nano-30b-a3b”,
messages,
temperature: temperature,
top_p: top_p,
max_tokens: max_tokens,
chat_template_kwargs: { enable_thinking: enableThinking },
stream,
});

return completion.choices[0].message;
}

How do I correctly format the input?

A couple of basic questions:

  1. Are you using the exact same temperature/top_p settings on the webui and through the API?
  2. Have you tried the API from other providers (e.g. OpenRouter, or even a locally deployed model) just as a sanity test to see if the API behavior is the same across providers?