Question Regarding Draft Model Support AnythingLLM via NVIDIA NIM

Hi NV member:

  After setting up a server using the NVIDIA Speculative Decoding sample, I am able to run AnythingLLM on the GB10 using the setup link below.
However, when using the built-in LLM provider’s NVIDIA NIM + Speculative Decoding server,
I can only use the main model nvidia/Llama-3.3-70B-Instruct-FP4, and I am unable to use the draft model nvidia/Llama-3.1-8B-Instruct-FP4.
Do you have any suggestions regarding this questions?
Thank you.
  • Speculative Decoding
  • AnythingLLM Setup:

Hello, what exact errors are you seeing? exact start up commands and resulting logs would help us efficiently triage.

Hi:

When using NVIDIA NIM in AnythingLLM, the model selection list only shows nvidia/Llama-3.3-70B-Instruct-FP4 and does not include nvidia/Llama-3.1-8B-Instruct-FP4.
However, in “Step 3: Test the draft–target setup,” the test command requires specifying nvidia/Llama-3.1-8B-Instruct-FP4 as the speculative model.
Although NVIDIA NIM can still chat normally in AnythingLLM, I’m wondering: Is AnythingLLM’s NVIDIA NIM integration actually using Speculative Decoding?

If not, please also help answer the following questions. Thank you:

  1. Is there any way for AnythingLLM’s NVIDIA NIM integration to directly use NVIDIA Speculative Decoding after setting up the server as described in NVIDIA’s documentation? Or does the UI currently require manual integration to specify the speculative model?

  2. What methods are available to confirm whether NVIDIA Speculative Decoding is being used? Is it possible to enable server logs?

  • Test Speculative Decoding command

  • Anythingllm nvidia nim setting

Hi, no answer from NVIDIA? that’s not good

Hi, AnythingLLM is a third-party platform, and its NIM integration is being deprecated per the notice on their site. As with any third-party tooling, NVIDIA has limited visibility and support. Please contact AnythingLLM directly for further assistance.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.