Internal Server error ,Try again

Help us respond more quickly by giving basic info about the issue. Fill in the appropriate details in the sections below. Make sure to upload screenshots and logs.

Which Workbench location had the issue?

(local)

What is the Operating System for your local Workbench?

(Windows )

What is the Workbench Desktop App version?

(0.25.30 or 0.28.29 —Not sure where do I check?)

Was the issue with the Desktop App or the CLI?

( Desktop App)

Summary of the Issue

*(Hello everyone,

Trying to run Mistral 7B Instruct.
My gpu’s are Nvidia 3060 12GB & a 3080TI 12GB. 96GB Ram. 5950X CPU
When I try Load the model it works fine. But when I launch the server I get the error above.

I tried like 20 times and it worked once. I kept restarting the environment and it eventually worked only once. But when I tried again im getting the same error. I managed to get it working a few times after trying to launch it multiple times. I think there is a bug. I launched the server it worked. I then stopped the server and launched it again and it never worked. I made no changes.

Any help please!!
)*

What are the error messages?

( internal Server error)

What are the steps to reproduce

(start server after I click load model)

Upload screenshots or logs

{“level”:“warn”,“container-registry”:“ghcr.io”,“time”:“2024-03-22T18:29:44+02:00”,“message”:“BaseEnvironmentLatestTag is unknown for images not in NGC registry”}
{“level”:“warn”,“topic”:“/home/workbench/nvidia-workbench/nvidia-workbench-example-hybrid-rag”,“channel”:“0xc0008221e0”,“time”:“2024-03-22T18:29:51+02:00”,“message”:“Failed to send to subscriber. Channel full”}
{“level”:“warn”,“topic”:“/home/workbench/nvidia-workbench/nvidia-workbench-example-hybrid-rag”,“channel”:“0xc001304600”,“time”:“2024-03-22T18:29:51+02:00”,“message”:“Failed to send to subscriber. Channel full”}
{“level”:“warn”,“topic”:“/home/workbench/nvidia-workbench/nvidia-workbench-example-hybrid-rag”,“channel”:“0xc0012baf00”,“time”:“2024-03-22T18:29:51+02:00”,“message”:“Failed to send to subscriber. Channel full”}
{“level”:“warn”,“topic”:“/home/workbench/nvidia-workbench/nvidia-workbench-example-hybrid-rag”,“channel”:“0xc000b64b40”,“time”:“2024-03-22T18:29:51+02:00”,“message”:“Failed to send to subscriber. Channel full”}
{“level”:“warn”,“topic”:“/home/workbench/nvidia-workbench/nvidia-workbench-example-hybrid-rag”,“channel”:“0xc000a929c0”,“time”:“2024-03-22T18:29:51+02:00”,“message”:“Failed to send to subscriber. Channel full”}
{“level”:“warn”,“topic”:“/home/workbench/nvidia-workbench/nvidia-workbench-example-hybrid-rag”,“channel”:“0xc001189320”,“time”:“2024-03-22T18:29:51+02:00”,“message”:“Failed to send to subscriber. Channel full”}
{“level”:“warn”,“topic”:“/home/workbench/nvidia-workbench/nvidia-workbench-example-hybrid-rag”,“channel”:“0xc000a938c0”,“time”:“2024-03-22T18:29:51+02:00”,“message”:“Failed to send to subscriber. Channel full”}
{“level”:“warn”,“topic”:“/home/workbench/nvidia-workbench/nvidia-workbench-example-hybrid-rag”,“channel”:“0xc0005a75c0”,“time”:“2024-03-22T18:29:51+02:00”,“message”:“Failed to send to subscriber. Channel full”}
{“level”:“warn”,“topic”:“/home/workbench/nvidia-workbench/nvidia-workbench-example-hybrid-rag”,“channel”:“0xc000a923c0”,“time”:“2024-03-22T18:29:51+02:00”,“message”:“Failed to send to subscriber. Channel full”}
{“level”:“info”,“time”:“2024-03-22T18:29:51+02:00”,“message”:“Processing git status output”}
{“level”:“info”,“time”:“2024-03-22T18:29:51+02:00”,“message”:“Processing git status output”}
{“level”:“info”,“time”:“2024-03-22T18:29:51+02:00”,“message”:“Processing git status output”}
{“level”:“info”,“time”:“2024-03-22T18:29:51+02:00”,“message”:“Processing git diff output”}
{“level”:“info”,“time”:“2024-03-22T18:29:51+02:00”,“message”:“Processing git diff output”}
{“level”:“info”,“time”:“2024-03-22T18:29:51+02:00”,“message”:“Processing git diff output”}
{“level”:“info”,“time”:“2024-03-22T18:29:51+02:00”,“message”:“Processing git diff output”}
{“level”:“info”,“time”:“2024-03-22T18:29:51+02:00”,“message”:“Processing git diff output”}
{“level”:“info”,“time”:“2024-03-22T18:29:51+02:00”,“message”:“Processing git diff output”}
{“level”:“warn”,“container-registry”:“ghcr.io”,“time”:“2024-03-22T18:29:52+02:00”,“message”:“BaseEnvironmentLatestTag is unknown for images not in NGC registry”}
{“level”:“info”,“time”:“2024/03/22 - 18:29:54”,“status”:200,“latency”:“10.786567563s”,“client-ip”:“127.0.0.1”,“method”:“POST”,“path”:“/v1/query”,“time”:“2024-03-22T18:29:54+02:00”,“message”:“GIN-Request”}
{“level”:“info”,“time”:“2024/03/22 - 18:33:38”,“status”:200,“latency”:“12.03µs”,“client-ip”:“127.0.0.1”,“method”:“OPTIONS”,“path”:“/v1/query”,“time”:“2024-03-22T18:33:38+02:00”,“message”:“GIN-Request”}
{“level”:“info”,“time”:“2024-03-22T18:33:38+02:00”,“message”:“Processing git status output”}
{“level”:“info”,“time”:“2024-03-22T18:33:38+02:00”,“message”:“Processing git diff output”}
{“level”:“info”,“time”:“2024-03-22T18:33:38+02:00”,“message”:“Processing git diff output”}
{“level”:“info”,“time”:“2024/03/22 - 18:33:38”,“status”:200,“latency”:“13.305451ms”,“client-ip”:“127.0.0.1”,“method”:“POST”,“path”:“/v1/query”,“time”:“2024-03-22T18:33:38+02:00”,“message”:“GIN-Request”}
{“level”:“info”,“time”:“2024-03-22T18:33:38+02:00”,“message”:“Processing git status output”}
{“level”:“info”,“time”:“2024-03-22T18:33:38+02:00”,“message”:“Processing git diff output”}
{“level”:“info”,“time”:“2024-03-22T18:33:38+02:00”,“message”:“Processing git diff output”}
{“level”:“info”,“time”:“2024/03/22 - 18:33:38”,“status”:200,“latency”:“48.637507ms”,“client-ip”:“127.0.0.1”,“method”:“POST”,“path”:“/v1/query”,“time”:“2024-03-22T18:33:38+02:00”,“message”:“GIN-Request”}

Please provide the following info (tick the boxes after creating this topic):

Submission Type
Bug or Error
Feature Request
Documentation Issue
Question
Other

Workbench Version
Desktop App v0.44.8
CLI v0.21.3
Other

Host Machine operating system and location
Local Windows 11
Local Windows 10
Local macOS
Local Ubuntu 22.04
Remote Ubuntu 22.04
Other

Bump anyone home??

Apologies for the delay. Still getting caught up from the GTC rush.

Try these debugging steps to manually start up the local inference server:

  1. Stop any running environments
  2. Start the environment from AIWB. Open Chat and Juptyerlab apps.
  3. In the Chat window, click “Local” inference mode on the right hand side. Wait for the RAG backend to set up properly and switch over to the Local inference settings.
  4. Click the Download button for the Mistral model. Wait for the model to download.
  5. Inside the Jupterlab window, open a terminal and run “bash /project/code/scripts/start-local.sh mistralai/Mistral-7B-Instruct-v0.1 bitsandbytes-nf4”

This should show some logs on what is going on.

WRT finding the version for Workbench, there are two ways.

  • In a terminal, run nvwb version and the output will give you the version for the CLI. It’s different but in step with the Desktop App version.
  • For the Desktop App, right click the grey Workbench icon in your system tray and select “About AI Workbench”. A window will open with the Desktop App version.

I’ve attached two screenshots for how to the the Desktop App on a Windows machine. Mac and Ubuntu 22.04 will be a similar set of steps.


Thanks.

My version is : Version 0.44.8


When I try run your command in JupterLab I get a syntax error :

Cell In[1], line 1
bash /project/code/scripts/start-local.sh mistralai/Mistral-7B-Instruct-v0.1 bitsandbytes-nf4
^
SyntaxError: invalid decimal literal


If I look at logs under “chat” , I get this :

2024-04-11T16:29:25.703495Z INFO text_generation_launcher: Args { model_id: “mistralai/Mistral-7B-Instruct-v0.1”, revision: None, validation_workers: 2, sharded: None, num_shard: None, quantize: Some(BitsandbytesNF4), speculate: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_top_n_tokens: 5, max_input_length: 4000, max_total_tokens: 5000, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: None, max_waiting_tokens: 20, max_batch_size: None, enable_cuda_graphs: false, hostname: “project-hybrid-rag”, port: 9090, shard_uds_path: “/tmp/text-generation-server”, master_addr: “localhost”, master_port: 29500, huggingface_hub_cache: Some(“/data/”), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 0.85, rope_scaling: None, rope_factor: None, json_output: false, otlp_endpoint: None, cors_allow_origin: , watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, tokenizer_config_path: None, disable_grammar_support: false, env: false }
2024-04-11T16:29:25.703611Z INFO download: text_generation_launcher: Starting download process.
Error: http://localhost:9090/info returned HTTP code 000

Got it, thanks. Ah, I meant running that command in a Jupyterlab terminal window (eg. new tab → terminal → run the command). If running as a cell in a jupyter notebook, typically you would need to add a bang (!) symbol in front of the command, or the notebook will interpret the command as python code.

Looks like you are getting a 000 HTTP code when starting the local inference server for Mistral and timing out. We are aware of the issue and pushing a fix soon to periodically poll for a 200 code before letting the user submit queries.

In the meantime, try this workaround. Restart the environment and open Jupyterlab. Under code/scripts/start-local.sh, locate the line “sleep 50 # Model warm-up”. You can try increasing that warmup period and seeing if it helps. Then open Chat and try again.