Setting batch size and enable vLLM prefix caching

Hello,

When deploying a NIM container on Sagemaker, how can we set the server batch size, batching strategy (dynamic, continuous, etc) as well as enable prefix caching for vLLM?

Thank you!

1 Like

Hi @adrian.m.alecu, at the moment we don’t support setting these options. Can you share more about your use case and requirements? It will help us to gauge interest and prioritize features