Hello,
When trying to deploy various versions of the Mistral docker container on Sagemaker endpoints (using this NIM example as guideline), I came across the following peculiarity:
- When using
public.ecr.aws/nvidia/nim:mistral-7b-instruct-v03-1.0.0
, SM deployment on a ml.g5.12xlarge instance finished successfully - When using
nvcr.io/nim/mistralai/mistral-7b-instruct-v0.3:latest
, SM deployment failed with errorModuleNotFoundError: No module named 'grpc'. You can run
pip install “ray[serve]”to install all Ray Serve dependencies.
.
The stacktrace for the failure is:
File "/opt/nim/llm/.venv/bin/serve", line 5, in <module>
from ray.serve.scripts import cli
File "/opt/nim/llm/.venv/lib/python3.10/site-packages/ray/serve/__init__.py", line 29, in <module>
raise e
File "/opt/nim/llm/.venv/lib/python3.10/site-packages/ray/serve/__init__.py", line 4, in <module>
from ray.serve.api import (
File "/opt/nim/llm/.venv/lib/python3.10/site-packages/ray/serve/api.py", line 14, in <module>
from ray.serve._private.config import (
File "/opt/nim/llm/.venv/lib/python3.10/site-packages/ray/serve/_private/config.py", line 30, in <module>
from ray.serve._private.utils import DEFAULT, DeploymentOptionUpdateType
File "/opt/nim/llm/.venv/lib/python3.10/site-packages/ray/serve/_private/utils.py", line 28, in <module>
from ray.serve._private.common import ServeComponentType
File "/opt/nim/llm/.venv/lib/python3.10/site-packages/ray/serve/_private/common.py", line 23, in <module>
from ray.serve.grpc_util import RayServegRPCContext
File "/opt/nim/llm/.venv/lib/python3.10/site-packages/ray/serve/grpc_util.py", line 3, in <module>
import grpc
Do you know why the latest container cannot be deployed on SageMaker?