Hello,
When deploying a NIM container on Sagemaker, how can we set the server batch size, batching strategy (dynamic, continuous, etc) as well as enable prefix caching for vLLM?
Thank you!
Hello,
When deploying a NIM container on Sagemaker, how can we set the server batch size, batching strategy (dynamic, continuous, etc) as well as enable prefix caching for vLLM?
Thank you!
Hi @adrian.m.alecu, at the moment we don’t support setting these options. Can you share more about your use case and requirements? It will help us to gauge interest and prioritize features