Ussue with VIA and VITA-2.0 - Error Code 402

ludovico.valenti · September 5, 2024, 5:02pm

Hi everyone,

I recently gained early access to VIA and successfully followed the requirements as outlined in the documentation to start a microservice with VITA-2.0. During my first attempt the model was loaded correctly, but the web server was not accessible:

Afterwards, I closed the instance and reopened it a couple of days later, but this time encountered Error code 402:

{‘type’: ‘urn:kaizen:problem-details:payment-required’, ‘title’: ‘Payment Required’, ‘status’: 402, ‘detail’: “Account ‘2c7slESkQF-uGIymCkSJGl9teCmX3mGzOeIwT2cjvow’: Cloud credits expired - Please contact NVIDIA representatives”, ‘instance’: ‘/v2/nvcf/pexec/functions/a88f115a-4a47-4381-ad62-ca25dc33dc1b’}

In the attempt of solving it, I regenerated both my NVIDIA API Key and the NGC Api key, but without any luck. Below is the command I executed in the terminal:

Blockquote > export BACKEND_PORT=8000
export FRONTEND_PORT=9000
export NVIDIA_API_KEY=
export NGC_API_KEY=
export MODEL_PATH=“ngc:nvidia/tao/vita:2.0.1”
export NGC_MODEL_CACHE=</SOME/DIR/ON/HOST>
docker run --rm -it --privileged=true --ipc=host --ulimit memlock=-1
–ulimit stack=67108864 --tmpfs /tmp:exec --name via-server
–gpus ‘“device=all”’
-p $FRONTEND_PORT:$FRONTEND_PORT
-p $BACKEND_PORT:$BACKEND_PORT
-e BACKEND_PORT=$BACKEND_PORT
-e FRONTEND_PORT=$FRONTEND_PORT
-e NVIDIA_API_KEY=$NVIDIA_API_KEY
-e NGC_API_KEY=$NGC_API_KEY
-e VLM_MODEL_TO_USE=vita-2.0
-e VLM_BATCH_SIZE=1
-v $NGC_MODEL_CACHE:/root/.via/ngc_model_cache
-e MODEL_PATH=$MODEL_PATH
-v via-hf-cache:/tmp/huggingface
nvcr.io/metropolis/via-dp/via-engine:2.0-dp

Any insight or guidance on resolving this issue would be appreciated. Thanks in advance for your help.

aryason · September 5, 2024, 5:09pm

In the first image it looks like everything worked as expected. As for the issue you are having now, your NGC API credits have expired. Are you able to download and self-host the Llama3-8b NIM?

ludovico.valenti · September 25, 2024, 12:59pm

Hi @aryason , thanks for your answer and sorry for my delay. Yes, I did download Llama3-8b NIM, but while self-hosting it I got the following error:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker_base.py", line 149, in execute_method
    return executor(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 100, in init_device
    _check_if_gpu_supports_dtype(self.model_config.dtype)
  File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 321, in _check_if_gpu_supports_dtype
    raise ValueError(
ValueError: Bfloat16 is only supported on GPUs with compute capability of at least 8.0. Your Tesla V100-DGXS-32GB GPU has compute capability 7.0. You can use float16 instead by explicitly setting the`dtype` flag in CLI, for example: --dtype=half.
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.10/dist-packages/vllm_nvext/entrypoints/openai/api_server.py", line 498, in <module>
    engine = AsyncLLMEngineFactory.from_engine_args(engine_args, usage_context=UsageContext.OPENAI_API_SERVER)
  File "/usr/local/lib/python3.10/dist-packages/vllm_nvext/engine/async_trtllm_engine.py", line 412, in from_engine_args
    engine = engine_cls.from_engine_args(engine_args, start_engine_loop, usage_context)
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 365, in from_engine_args
    engine = cls(
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 323, in __init__
    self.engine = self._init_engine(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 442, in _init_engine
    return engine_class(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 148, in __init__
    self.model_executor = executor_class(
  File "/usr/local/lib/python3.10/dist-packages/vllm/executor/ray_gpu_executor.py", line 382, in __init__
    super().__init__(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/vllm/executor/executor_base.py", line 41, in __init__
    self._init_executor()
  File "/usr/local/lib/python3.10/dist-packages/vllm/executor/ray_gpu_executor.py", line 45, in _init_executor
    self._init_workers_ray(placement_group)
  File "/usr/local/lib/python3.10/dist-packages/vllm/executor/ray_gpu_executor.py", line 181, in _init_workers_ray
    self._run_workers("init_device")
  File "/usr/local/lib/python3.10/dist-packages/vllm/executor/ray_gpu_executor.py", line 318, in _run_workers
    driver_worker_output = self.driver_worker.execute_method(
  File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker_base.py", line 158, in execute_method
    raise e
  File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker_base.py", line 149, in execute_method
    return executor(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 100, in init_device
    _check_if_gpu_supports_dtype(self.model_config.dtype)
  File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 321, in _check_if_gpu_supports_dtype
    raise ValueError(
ValueError: Bfloat16 is only supported on GPUs with compute capability of at least 8.0. Your Tesla V100-DGXS-32GB GPU has compute capability 7.0. You can use float16 instead by explicitly setting the`dtype` flag in CLI, for example: --dtype=half.
(RayWorkerWrapper pid=2678) ERROR 09-25 12:54:42 worker_base.py:157] Error executing method init_device. This might cause deadlock in distributed execution.
(RayWorkerWrapper pid=2678) ERROR 09-25 12:54:42 worker_base.py:157] Traceback (most recent call last):
(RayWorkerWrapper pid=2678) ERROR 09-25 12:54:42 worker_base.py:157]   File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker_base.py", line 149, in execute_method
(RayWorkerWrapper pid=2678) ERROR 09-25 12:54:42 worker_base.py:157]     return executor(*args, **kwargs)
(RayWorkerWrapper pid=2678) ERROR 09-25 12:54:42 worker_base.py:157]   File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 100, in init_device
(RayWorkerWrapper pid=2678) ERROR 09-25 12:54:42 worker_base.py:157]     _check_if_gpu_supports_dtype(self.model_config.dtype)
(RayWorkerWrapper pid=2678) ERROR 09-25 12:54:42 worker_base.py:157]   File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 321, in _check_if_gpu_supports_dtype
(RayWorkerWrapper pid=2678) ERROR 09-25 12:54:42 worker_base.py:157]     raise ValueError(
(RayWorkerWrapper pid=2678) ERROR 09-25 12:54:42 worker_base.py:157] ValueError: Bfloat16 is only supported on GPUs with compute capability of at least 8.0. Your Tesla V100-DGXS-32GB GPU has compute capability 7.0. You can use float16 instead by explicitly setting the`dtype` flag in CLI, for example: --dtype=half.

Is there any viable way to set fp16 rather than bf16? I already tried to pass -e NIM_MODEL_PROFILE="vllm-fp16-tp1" but it did not work.

Thanks for your support.

yuweiw · September 26, 2024, 1:45am

Could you try to add --dtype float16 parameter when you are self-hosting it? Also you may refer to this topic about your scenario.

ludovico.valenti · September 26, 2024, 8:15am

Thank you for your response, @yuweiw . Unfortunately, neither--dtype float16 nor --dtype half seem to be available as parameters to pass to Docker.

Additionally, since I believe that the issues I am encountering are directly related to the lack of available credits in my account, Is there a way to purchase more credits? I’ve been unable to find any information on how to do so here. After the login, I can only verify my current balance.

yuweiw · September 27, 2024, 7:10am

You can refer to our Quick Start Guide on page 11. We currently support the following GPU models: L40 / L40s H100 A100 80GB / 40GB A6000. Can you check to see if your GPU model is one of the above?

About the credits issue, you can refer to this FAQ.

ludovico.valenti · September 27, 2024, 7:22am

Thanks @yuweiw , according to the documentation my GPU model is not supported. Despite that, during my first attempt the model was correctly deployed:

According to the Quick Start Guide seems like it is not possible to purchase credits:

Each new account can receive up to 5000 credits to try out the APIs. To continue development after credits run out, you can deploy the downloadable NIM microservices locally to your hardware or to a cloud instance.

Is it correct?

Thanks for your support.

yuweiw · September 29, 2024, 1:33am

Yes. You can deploy the downloadable NIM microservices locally to your hardware or to a cloud instance.

yingliu · November 1, 2024, 8:40am

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

system · November 15, 2024, 8:41am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Error while downloading VIA Visual AI Agent llama	20	515	September 23, 2024
VIA microservices not working any longer Visual AI Agent nim	16	191	November 7, 2025
Error converting Vita-2.0 model checkpoint Visual AI Agent llama	4	243	November 15, 2024
VSS blueprint 2.2.0 - ERROR Failed to load VIA stream handler - Failed to generate TRT-LLM engine Visual AI Agent nim , llama-31-70b-instruct , llama	16	540	April 22, 2025
VSS 2.3.0 Docker remote_llm_deployment Failed to generate TRT-LLM engine Visual AI Agent nim , paligemma , kosmos-2 , llama	5	149	May 23, 2025
(VSS 2.3.0) Issue with Using vila and nvila Models in VSS Deployment Visual AI Agent nim , llama-31-70b-instruct , llama	6	222	August 14, 2025
NVIDIA NIM API / openai.API: Error code: 402,Cloud credits expired - Please contact NVIDIA representatives Models nim , llama-31-405b-instruct , llama	8	579	January 19, 2025
VIA Summarization Workflow ERROR Visual AI Agent llama	34	697	March 5, 2025
Via_2.0 Visual AI Agent	3	139	December 5, 2024
Error when running Docker VIA Microservices Visual AI Agent nim	8	286	August 13, 2024

Ussue with VIA and VITA-2.0 - Error Code 402

Related topics