Question Regarding Draft Model Support AnythingLLM via NVIDIA NIM

Turtle7777 · November 18, 2025, 8:49am

Hi NV member:

  After setting up a server using the NVIDIA Speculative Decoding sample, I am able to run AnythingLLM on the GB10 using the setup link below.
However, when using the built-in LLM provider’s NVIDIA NIM + Speculative Decoding server,
I can only use the main model nvidia/Llama-3.3-70B-Instruct-FP4, and I am unable to use the draft model nvidia/Llama-3.1-8B-Instruct-FP4.
Do you have any suggestions regarding this questions?
Thank you.

Speculative Decoding

AnythingLLM Setup:

NVES · November 18, 2025, 4:28pm

Hello, what exact errors are you seeing? exact start up commands and resulting logs would help us efficiently triage.

Turtle7777 · November 19, 2025, 5:47am

Hi:

When using NVIDIA NIM in AnythingLLM, the model selection list only shows nvidia/Llama-3.3-70B-Instruct-FP4 and does not include nvidia/Llama-3.1-8B-Instruct-FP4.
However, in “Step 3: Test the draft–target setup,” the test command requires specifying nvidia/Llama-3.1-8B-Instruct-FP4 as the speculative model.
Although NVIDIA NIM can still chat normally in AnythingLLM, I’m wondering: Is AnythingLLM’s NVIDIA NIM integration actually using Speculative Decoding?

If not, please also help answer the following questions. Thank you:

Is there any way for AnythingLLM’s NVIDIA NIM integration to directly use NVIDIA Speculative Decoding after setting up the server as described in NVIDIA’s documentation? Or does the UI currently require manual integration to specify the speculative model?
What methods are available to confirm whether NVIDIA Speculative Decoding is being used? Is it possible to enable server logs?

Test Speculative Decoding command

Anythingllm nvidia nim setting

angelromero9 · January 2, 2026, 6:23am

Hi, no answer from NVIDIA? that’s not good

NVES · January 2, 2026, 4:39pm

Hi, AnythingLLM is a third-party platform, and its NIM integration is being deprecated per the notice on their site. As with any third-party tooling, NVIDIA has limited visibility and support. Please contact AnythingLLM directly for further assistance.

system · January 16, 2026, 4:40pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Playbook vLLM inference models naming/links issue DGX Spark / GB10 Projects jetson , llama-31-8b-instruct , llama , jira-issue , documentation-update , nemotron	1	121	February 10, 2026
Result of nvidia nims in openai SDK and API inconsistent NVIDIA Nemotron nim , llama-31-405b-instruct , llama	0	94	January 7, 2025
Aunch NVIDIA NIM (llama3-8b-instruct) for LLMs locally Access/Accounts nim , llama3-8b-instruct	3	216	November 8, 2024
Missing official native ARM64 NIM images for essential AI models DGX Spark / GB10 nim , llama	4	380	December 17, 2025
Nemotron-3-Nano-30B-A3B-NVFP4 ultra-efficient NVFP4 precision version of Nemotron 3 Nano DGX Spark / GB10 jetson , nemotron	74	1704	February 26, 2026
NIM does not support llama-3.1-8b-instruct and llama-3.1-70b-instruct on GH200 On-Prem deployment Models nim , llama-31-8b-instruct , llama	1	361	November 7, 2024
NVIDIA folks -- where is this promised nvfp4 speedup? DGX Spark / GB10	24	1628	January 11, 2026
Model Orchestration and Deployment DGX Spark / GB10 nim	4	547	November 24, 2025
Supercharging Llama 3.1 across NVIDIA Platforms Technical Blog	14	366	September 17, 2024
We unlocked NVFP4 on the DGX Spark: 20% faster than AWQ! DGX Spark / GB10	141	3218	March 2, 2026

Question Regarding Draft Model Support AnythingLLM via NVIDIA NIM

Related topics