Im looking to use NIM to deploy Llama-3.1-8b-instruct base model with Huggingface trained LoRA adapters. Im deploying the nvcr.io/nim/meta/llama3-8b-instruct:latest image on H100 SXM on lambda labs. You can reproduce this error using the NVIDIA example: Deploy Multilingual LLMs with NVIDIA NIM | NVIDIA Technical Blog.
Im able to deploy this NIM successfully which I check by sending a request to the models endpoint: curl -X GET 'http://IP_ADDRESS:8000/v1/models'
. This correctly returns the base model and the two mounted LoRA adapters.
When trying to generate a model response using the completions endpoint I get back an internal server error. The error is Exception: lora format could not be determined
. I think this error is new because I have been able to run LoRA swapping successfully with NIM in the past. Below I put the full error from the Docker logs.
What is causing the LoRA adapters not to be recognized as valid connectors? Any help would be greatly appreciated.
INFO 10-15 07:50:06.582 httptools_impl.py:481] 46.231.244.214:64380 - "POST /v1/completions HTTP/1.1" 500
ERROR 10-15 07:50:06.582 httptools_impl.py:416] Exception in ASGI application
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/httptools_impl.py", line 411, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/usr/local/lib/python3.10/dist-packages/uvicorn/middleware/proxy_headers.py", line 69, in __call__
return await self.app(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/fastapi/applications.py", line 1054, in __call__
await super().__call__(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/applications.py", line 123, in __call__
await self.middleware_stack(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 186, in __call__
raise exc
File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 164, in __call__
await self.app(scope, receive, _send)
File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/cors.py", line 85, in __call__
await self.app(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/exceptions.py", line 65, in __call__
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 756, in __call__
await self.middleware_stack(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 776, in app
await route.handle(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 297, in handle
await self.app(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 77, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 72, in app
response = await func(request)
File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 278, in app
raw_response = await run_endpoint_function(
File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 191, in run_endpoint_function
return await dependant.call(**values)
File "/usr/local/lib/python3.10/dist-packages/vllm_nvext/entrypoints/openai/api_server.py", line 389, in create_completion
generator = await openai_serving_completion.create_completion(request, raw_request)
File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/serving_completion.py", line 169, in create_completion
async for i, res in result_generator:
File "/usr/local/lib/python3.10/dist-packages/vllm/utils.py", line 228, in consumer
raise item
File "/usr/local/lib/python3.10/dist-packages/vllm/utils.py", line 213, in producer
async for item in iterator:
File "/usr/local/lib/python3.10/dist-packages/vllm_nvext/engine/async_trtllm_engine.py", line 391, in generate
raise e
File "/usr/local/lib/python3.10/dist-packages/vllm_nvext/engine/async_trtllm_engine.py", line 385, in generate
async for request_output in stream:
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 77, in __anext__
raise result
Exception: lora format could not be determined