i have trying to inference from the NIM but it take very very long time
[2025-12-18 03:30:49,971][httpx][INFO] - HTTP Request: POST https://integrate.api.nvidia.com/v1/chat/completions "HTTP/1.1 200 OK"
[2025-12-18 03:32:10,143][httpx][INFO] - HTTP Request: POST https://integrate.api.nvidia.com/v1/chat/completions "HTTP/1.1 200 OK"
[2025-12-18 03:42:10,673][openai._base_client][INFO] - Retrying request to /chat/completions in 0.497196 seconds
[2025-12-18 03:42:46,472][httpx][INFO] - HTTP Request: POST https://integrate.api.nvidia.com/v1/chat/completions "HTTP/1.1 200 OK"
[2025-12-18 03:42:48,132][httpx][INFO] - HTTP Request: POST https://integrate.api.nvidia.com/v1/embeddings "HTTP/1.1 200 OK"
[2025-12-18 03:42:48,621][shinka.core.novelty_judge][INFO] - Top-5 similarity scores: ['0.97']
[2025-12-18 03:47:18,910][openai._base_client][INFO] - Retrying request to /chat/completions in 0.490957 seconds
[2025-12-18 03:51:49,718][openai._base_client][INFO] - Retrying request to /chat/completions in 0.826494 seconds
[2025-12-18 03:56:20,819][backoff][INFO] - Backing off query_nvidia(...) for 0.8s (openai.APIConnectionError: Connection error.)
[2025-12-18 03:56:20,819][shinka.llm.models.nvidia][INFO] - NVIDIA - Retry 1 due to error: Connection error.. Waiting 0.8s...
[2025-12-18 04:00:52,054][openai._base_client][INFO] - Retrying request to /chat/completions in 0.414927 seconds
[2025-12-18 04:10:52,748][openai._base_client][INFO] - Retrying request to /chat/completions in 0.762142 seconds
[2025-12-18 04:15:23,824][backoff][INFO] - Backing off query_nvidia(...) for 1.4s (openai.APIConnectionError: Connection error.)
[2025-12-18 04:15:23,824][shinka.llm.models.nvidia][INFO] - NVIDIA - Retry 2 due to error: Connection error.. Waiting 1.4s...
[2025-12-18 04:19:55,584][openai._base_client][INFO] - Retrying request to /chat/completions in 0.491452 seconds
[2025-12-18 04:24:26,365][openai._base_client][INFO] - Retrying request to /chat/completions in 0.913321 seconds
[2025-12-18 04:28:57,720][backoff][INFO] - Backing off query_nvidia(...) for 3.5s (openai.APIConnectionError: Connection error.)
[2025-12-18 04:28:57,720][shinka.llm.models.nvidia][INFO] - NVIDIA - Retry 3 due to error: Connection error.. Waiting 3.5s...
[2025-12-18 04:39:01,505][openai._base_client][INFO] - Retrying request to /chat/completions in 0.391560 seconds
[2025-12-18 04:43:32,209][openai._base_client][INFO] - Retrying request to /chat/completions in 0.843957 seconds
[2025-12-18 04:48:03,491][backoff][INFO] - Backing off query_nvidia(...) for 4.9s (openai.APIConnectionError: Connection error.)
[2025-12-18 04:48:03,491][shinka.llm.models.nvidia][INFO] - NVIDIA - Retry 4 due to error: Connection error.. Waiting 4.9s...
[2025-12-18 04:52:38,675][openai._base_client][INFO] - Retrying request to /chat/completions in 0.397126 seconds
[2025-12-18 05:02:39,442][openai._base_client][INFO] - Retrying request to /chat/completions in 0.892992 seconds
[2025-12-18 05:12:40,643][backoff][INFO] - Backing off query_nvidia(...) for 3.5s (openai.APITimeoutError: Request timed out.)
[2025-12-18 05:12:40,643][shinka.llm.models.nvidia][INFO] - NVIDIA - Retry 5 due to error: Request timed out.. Waiting 3.5s...
[2025-12-18 05:17:14,157][openai._base_client][INFO] - Retrying request to /chat/completions in 0.451072 seconds
[2025-12-18 05:21:44,898][openai._base_client][INFO] - Retrying request to /chat/completions in 0.946891 seconds
[2025-12-18 05:26:16,186][backoff][INFO] - Backing off query_nvidia(...) for 10.2s (openai.APIConnectionError: Connection error.)
[2025-12-18 05:26:16,186][shinka.llm.models.nvidia][INFO] - NVIDIA - Retry 6 due to error: Connection error.. Waiting 10.2s...
[2025-12-18 05:36:26,551][openai._base_client][INFO] - Retrying request to /chat/completions in 0.468014 seconds
[2025-12-18 05:40:57,508][openai._base_client][INFO] - Retrying request to /chat/completions in 0.857667 seconds
[2025-12-18 05:45:28,435][backoff][INFO] - Backing off query_nvidia(...) for 14.1s (openai.APIConnectionError: Connection error.)
[2025-12-18 05:45:28,436][shinka.llm.models.nvidia][INFO] - NVIDIA - Retry 7 due to error: Connection error.. Waiting 14.1s...
[2025-12-18 05:50:12,899][openai._base_client][INFO] - Retrying request to /chat/completions in 0.400749 seconds
[2025-12-18 06:00:13,503][openai._base_client][INFO] - Retrying request to /chat/completions in 0.803331 seconds
[2025-12-18 06:00:19,107][backoff][INFO] - Backing off query_nvidia(...) for 4.0s (openai.APIConnectionError: Connection error.)
[2025-12-18 06:00:19,107][shinka.llm.models.nvidia][INFO] - NVIDIA - Retry 8 due to error: Connection error.. Waiting 4.0s...
[2025-12-18 06:04:53,478][openai._base_client][INFO] - Retrying request to /chat/completions in 0.398750 seconds
[2025-12-18 06:09:26,468][openai._base_client][INFO] - Retrying request to /chat/completions in 0.760055 seconds
[2025-12-18 06:19:32,943][backoff][INFO] - Backing off query_nvidia(...) for 11.8s (openai.APITimeoutError: Request timed out.)
[2025-12-18 06:19:32,943][shinka.llm.models.nvidia][INFO] - NVIDIA - Retry 9 due to error: Request timed out.. Waiting 11.8s...
[2025-12-18 06:29:48,940][openai._base_client][INFO] - Retrying request to /chat/completions in 0.465329 seconds
[2025-12-18 06:34:19,739][openai._base_client][INFO] - Retrying request to /chat/completions in 0.968840 seconds
those are the models im using:
- “deepseek-ai/deepseek-v3.2”
- "nvidia/nemotron-3-nano-30b-a3b"
- "moonshotai/kimi-k2-thinking"
- "mistralai/devstral-2-123b-instruct-2512"
- "mistralai/mistral-large-3-675b-instruct-2512"
- "deepseek-ai/deepseek-v3.1-terminus"
and the input and output always under 3000 tokens, why?