Ussue with VIA and VITA-2.0 - Error Code 402

Hi everyone,

I recently gained early access to VIA and successfully followed the requirements as outlined in the documentation to start a microservice with VITA-2.0. During my first attempt the model was loaded correctly, but the web server was not accessible:

Afterwards, I closed the instance and reopened it a couple of days later, but this time encountered Error code 402:

{‘type’: ‘urn:kaizen:problem-details:payment-required’, ‘title’: ‘Payment Required’, ‘status’: 402, ‘detail’: “Account ‘2c7slESkQF-uGIymCkSJGl9teCmX3mGzOeIwT2cjvow’: Cloud credits expired - Please contact NVIDIA representatives”, ‘instance’: ‘/v2/nvcf/pexec/functions/a88f115a-4a47-4381-ad62-ca25dc33dc1b’}

In the attempt of solving it, I regenerated both my NVIDIA API Key and the NGC Api key, but without any luck. Below is the command I executed in the terminal:

Blockquote > export BACKEND_PORT=8000
export FRONTEND_PORT=9000
export NVIDIA_API_KEY=
export NGC_API_KEY=
export MODEL_PATH=“ngc:nvidia/tao/vita:2.0.1”
export NGC_MODEL_CACHE=</SOME/DIR/ON/HOST>
docker run --rm -it --privileged=true --ipc=host --ulimit memlock=-1
–ulimit stack=67108864 --tmpfs /tmp:exec --name via-server
–gpus ‘“device=all”’
-p $FRONTEND_PORT:$FRONTEND_PORT
-p $BACKEND_PORT:$BACKEND_PORT
-e BACKEND_PORT=$BACKEND_PORT
-e FRONTEND_PORT=$FRONTEND_PORT
-e NVIDIA_API_KEY=$NVIDIA_API_KEY
-e NGC_API_KEY=$NGC_API_KEY
-e VLM_MODEL_TO_USE=vita-2.0
-e VLM_BATCH_SIZE=1
-v $NGC_MODEL_CACHE:/root/.via/ngc_model_cache
-e MODEL_PATH=$MODEL_PATH
-v via-hf-cache:/tmp/huggingface
nvcr.io/metropolis/via-dp/via-engine:2.0-dp

Any insight or guidance on resolving this issue would be appreciated. Thanks in advance for your help.

In the first image it looks like everything worked as expected. As for the issue you are having now, your NGC API credits have expired. Are you able to download and self-host the Llama3-8b NIM?

Hi @aryason , thanks for your answer and sorry for my delay. Yes, I did download Llama3-8b NIM, but while self-hosting it I got the following error:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker_base.py", line 149, in execute_method
    return executor(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 100, in init_device
    _check_if_gpu_supports_dtype(self.model_config.dtype)
  File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 321, in _check_if_gpu_supports_dtype
    raise ValueError(
ValueError: Bfloat16 is only supported on GPUs with compute capability of at least 8.0. Your Tesla V100-DGXS-32GB GPU has compute capability 7.0. You can use float16 instead by explicitly setting the`dtype` flag in CLI, for example: --dtype=half.
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.10/dist-packages/vllm_nvext/entrypoints/openai/api_server.py", line 498, in <module>
    engine = AsyncLLMEngineFactory.from_engine_args(engine_args, usage_context=UsageContext.OPENAI_API_SERVER)
  File "/usr/local/lib/python3.10/dist-packages/vllm_nvext/engine/async_trtllm_engine.py", line 412, in from_engine_args
    engine = engine_cls.from_engine_args(engine_args, start_engine_loop, usage_context)
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 365, in from_engine_args
    engine = cls(
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 323, in __init__
    self.engine = self._init_engine(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 442, in _init_engine
    return engine_class(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 148, in __init__
    self.model_executor = executor_class(
  File "/usr/local/lib/python3.10/dist-packages/vllm/executor/ray_gpu_executor.py", line 382, in __init__
    super().__init__(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/vllm/executor/executor_base.py", line 41, in __init__
    self._init_executor()
  File "/usr/local/lib/python3.10/dist-packages/vllm/executor/ray_gpu_executor.py", line 45, in _init_executor
    self._init_workers_ray(placement_group)
  File "/usr/local/lib/python3.10/dist-packages/vllm/executor/ray_gpu_executor.py", line 181, in _init_workers_ray
    self._run_workers("init_device")
  File "/usr/local/lib/python3.10/dist-packages/vllm/executor/ray_gpu_executor.py", line 318, in _run_workers
    driver_worker_output = self.driver_worker.execute_method(
  File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker_base.py", line 158, in execute_method
    raise e
  File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker_base.py", line 149, in execute_method
    return executor(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 100, in init_device
    _check_if_gpu_supports_dtype(self.model_config.dtype)
  File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 321, in _check_if_gpu_supports_dtype
    raise ValueError(
ValueError: Bfloat16 is only supported on GPUs with compute capability of at least 8.0. Your Tesla V100-DGXS-32GB GPU has compute capability 7.0. You can use float16 instead by explicitly setting the`dtype` flag in CLI, for example: --dtype=half.
(RayWorkerWrapper pid=2678) ERROR 09-25 12:54:42 worker_base.py:157] Error executing method init_device. This might cause deadlock in distributed execution.
(RayWorkerWrapper pid=2678) ERROR 09-25 12:54:42 worker_base.py:157] Traceback (most recent call last):
(RayWorkerWrapper pid=2678) ERROR 09-25 12:54:42 worker_base.py:157]   File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker_base.py", line 149, in execute_method
(RayWorkerWrapper pid=2678) ERROR 09-25 12:54:42 worker_base.py:157]     return executor(*args, **kwargs)
(RayWorkerWrapper pid=2678) ERROR 09-25 12:54:42 worker_base.py:157]   File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 100, in init_device
(RayWorkerWrapper pid=2678) ERROR 09-25 12:54:42 worker_base.py:157]     _check_if_gpu_supports_dtype(self.model_config.dtype)
(RayWorkerWrapper pid=2678) ERROR 09-25 12:54:42 worker_base.py:157]   File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 321, in _check_if_gpu_supports_dtype
(RayWorkerWrapper pid=2678) ERROR 09-25 12:54:42 worker_base.py:157]     raise ValueError(
(RayWorkerWrapper pid=2678) ERROR 09-25 12:54:42 worker_base.py:157] ValueError: Bfloat16 is only supported on GPUs with compute capability of at least 8.0. Your Tesla V100-DGXS-32GB GPU has compute capability 7.0. You can use float16 instead by explicitly setting the`dtype` flag in CLI, for example: --dtype=half.

Is there any viable way to set fp16 rather than bf16? I already tried to pass -e NIM_MODEL_PROFILE="vllm-fp16-tp1" but it did not work.

Thanks for your support.

Could you try to add --dtype float16 parameter when you are self-hosting it? Also you may refer to this topic about your scenario.

Thank you for your response, @yuweiw . Unfortunately, neither--dtype float16 nor --dtype half seem to be available as parameters to pass to Docker.

Additionally, since I believe that the issues I am encountering are directly related to the lack of available credits in my account, Is there a way to purchase more credits? I’ve been unable to find any information on how to do so here. After the login, I can only verify my current balance.

You can refer to our Quick Start Guide on page 11. We currently support the following GPU models: L40 / L40s H100 A100 80GB / 40GB A6000. Can you check to see if your GPU model is one of the above?

About the credits issue, you can refer to this FAQ.

Thanks @yuweiw , according to the documentation my GPU model is not supported. Despite that, during my first attempt the model was correctly deployed:

image

According to the Quick Start Guide seems like it is not possible to purchase credits:

Each new account can receive up to 5000 credits to try out the APIs. To continue development after credits run out, you can deploy the downloadable NIM microservices locally to your hardware or to a cloud instance.

Is it correct?

Thanks for your support.

Yes. You can deploy the downloadable NIM microservices locally to your hardware or to a cloud instance.

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.