Get Internal server error & Uvicorn server error when upload model at SDK 4.1

Hello there.

I just setup container with Clara SDK 4.1 image under the instruction of AIAA installation and suceefully open the MONAIlabel API at browser

But it failed to upload the model at AIAA admin & get the GPU info; both of them showed internal server error. The upload model showed not found at browser.

This issue ooccurred at both Pytorch & Triton backend and let me can’t upload the new model into SDK 4.1

I tried to see the log file and found that when I execute nvidia-smi or upload the MMAR model at docker using curl or API interface; the uvicron server reported error like that:

MainThread] [ERROR] (uvicorn.error:375) - Exception in ASGI application

It also went to a strange uvicron error message when the docker startup:

clara-train-sdk_1  | 2022-09-13T21:42:05.770065667Z [2022-09-13 21:42:05,769] [timeloop] [INFO] Timeloop now started. Jobs will run based on the interval set
clara-train-sdk_1  | 2022-09-13T21:42:05.770072383Z [2022-09-13 21:42:05,769] [80] [MainThread] [INFO] (timeloop:63) - Timeloop now started. Jobs will run based on the interval set
clara-train-sdk_1  | 2022-09-13T21:42:05.770276844Z [2022-09-13 21:42:05,770] [80] [MainThread] [INFO] (uvicorn.error:59) - Application startup complete.
clara-train-sdk_1  | 2022-09-13T21:42:05.770706829Z [2022-09-13 21:42:05,770] [80] [MainThread] [INFO] (uvicorn.error:206) - Uvicorn running on http://0.0.0.0:5000 (Press CTRL+C to quit)

Did anyone upload the new model in docker image of Clara SDK 4.1 with internal server error or know how to deal with it?

Thank you for everything

The following is full error log of above problem.

Error log when execute nvidia-smi:

clara-train-sdk_1  | 2022-09-13T21:55:18.951962120Z [2022-09-13 21:55:18,950] [80]
[MainThread] [ERROR] (uvicorn.error:375) - Exception in ASGI application
clara-train-sdk_1  | 2022-09-13T21:55:18.951976636Z Traceback (most recent call last):
clara-train-sdk_1  | 2022-09-13T21:55:18.951979537Z   File "/opt/conda/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py", line 372, in run_asgi
clara-train-sdk_1  | 2022-09-13T21:55:18.951982039Z     result = await app(self.scope, self.receive, self.send)
clara-train-sdk_1  | 2022-09-13T21:55:18.951984302Z   File "/opt/conda/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 75, in __call__
clara-train-sdk_1  | 2022-09-13T21:55:18.951986619Z     return await self.app(scope, receive, send)
clara-train-sdk_1  | 2022-09-13T21:55:18.951988608Z   File "/opt/conda/lib/python3.8/site-packages/fastapi/applications.py", line 212, in __call__
clara-train-sdk_1  | 2022-09-13T21:55:18.951990822Z     await super().__call__(scope, receive, send)
clara-train-sdk_1  | 2022-09-13T21:55:18.951992933Z   File "/opt/conda/lib/python3.8/site-packages/starlette/applications.py", line 112, in __call__
clara-train-sdk_1  | 2022-09-13T21:55:18.951995151Z     await self.middleware_stack(scope, receive, send)
clara-train-sdk_1  | 2022-09-13T21:55:18.951997232Z   File "/opt/conda/lib/python3.8/site-packages/starlette/middleware/errors.py", line 181, in __call__
clara-train-sdk_1  | 2022-09-13T21:55:18.951999454Z     raise exc
clara-train-sdk_1  | 2022-09-13T21:55:18.952001549Z   File "/opt/conda/lib/python3.8/site-packages/starlette/middleware/errors.py", line 159, in __call__
clara-train-sdk_1  | 2022-09-13T21:55:18.952003769Z     await self.app(scope, receive, _send)
clara-train-sdk_1  | 2022-09-13T21:55:18.952005824Z   File "/opt/conda/lib/python3.8/site-packages/starlette/middleware/cors.py", line 84, in __call__
clara-train-sdk_1  | 2022-09-13T21:55:18.952008058Z     await self.app(scope, receive, send)
clara-train-sdk_1  | 2022-09-13T21:55:18.952010125Z   File "/opt/conda/lib/python3.8/site-packages/starlette/exceptions.py", line 82, in __call__
clara-train-sdk_1  | 2022-09-13T21:55:18.952012356Z     raise exc
clara-train-sdk_1  | 2022-09-13T21:55:18.952014435Z   File "/opt/conda/lib/python3.8/site-packages/starlette/exceptions.py", line 71, in __call__
clara-train-sdk_1  | 2022-09-13T21:55:18.952016586Z     await self.app(scope, receive, sender)
clara-train-sdk_1  | 2022-09-13T21:55:18.952018629Z   File "/opt/conda/lib/python3.8/site-packages/starlette/routing.py", line 656, in __call__
clara-train-sdk_1  | 2022-09-13T21:55:18.952020793Z     await route.handle(scope, receive, send)
clara-train-sdk_1  | 2022-09-13T21:55:18.952022832Z   File "/opt/conda/lib/python3.8/site-packages/starlette/routing.py", line 259, in handle
clara-train-sdk_1  | 2022-09-13T21:55:18.952025014Z     await self.app(scope, receive, send)
clara-train-sdk_1  | 2022-09-13T21:55:18.952027066Z   File "/opt/conda/lib/python3.8/site-packages/starlette/routing.py", line 61, in app
clara-train-sdk_1  | 2022-09-13T21:55:18.952029214Z     response = await func(request)
clara-train-sdk_1  | 2022-09-13T21:55:18.952031267Z   File "/opt/conda/lib/python3.8/site-packages/fastapi/routing.py", line 226, in app
clara-train-sdk_1  | 2022-09-13T21:55:18.952033399Z     raw_response = await run_endpoint_function(
clara-train-sdk_1  | 2022-09-13T21:55:18.952035457Z   File "/opt/conda/lib/python3.8/site-packages/fastapi/routing.py", line 159, in run_endpoint_function
clara-train-sdk_1  | 2022-09-13T21:55:18.952042894Z     return await dependant.call(**values)
clara-train-sdk_1  | 2022-09-13T21:55:18.952044875Z   File "/opt/conda/lib/python3.8/site-packages/monailabel/endpoints/logs.py", line 103, in gpu_info
clara-train-sdk_1  | 2022-09-13T21:55:18.952046822Z     response = subprocess.run(["nvidia-smi"], stdout=subprocess.PIPE).stdout.decode("utf-8")
clara-train-sdk_1  | 2022-09-13T21:55:18.952048756Z   File "/opt/conda/lib/python3.8/subprocess.py", line 493, in run
clara-train-sdk_1  | 2022-09-13T21:55:18.952050596Z     with Popen(*popenargs, **kwargs) as process:
clara-train-sdk_1  | 2022-09-13T21:55:18.952052349Z   File "/opt/conda/lib/python3.8/subprocess.py", line 858, in __init__
clara-train-sdk_1  | 2022-09-13T21:55:18.952054165Z     self._execute_child(args, executable, preexec_fn, close_fds,
clara-train-sdk_1  | 2022-09-13T21:55:18.952055928Z   File "/opt/conda/lib/python3.8/subprocess.py", line 1704, in _execute_child
clara-train-sdk_1  | 2022-09-13T21:55:18.952057720Z     raise child_exception_type(errno_num, err_msg, err_filename)
clara-train-sdk_1  | 2022-09-13T21:55:18.952059530Z OSError: [Errno 8] Exec format error: 'nvidia-smi'

Error log when upload model with curl method:

clara-train-sdk_1  | 2022-09-13T22:03:51.684215456Z [2022-09-13 22:03:51,681] [80]
[MainThread] [ERROR] (uvicorn.error:375) - Exception in ASGI application
clara-train-sdk_1  | 2022-09-13T22:03:51.684230029Z Traceback (most recent call last):
clara-train-sdk_1  | 2022-09-13T22:03:51.684232751Z   File "/opt/conda/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py", line 372, in run_asgi
clara-train-sdk_1  | 2022-09-13T22:03:51.684235164Z     result = await app(self.scope, self.receive, self.send)
clara-train-sdk_1  | 2022-09-13T22:03:51.684237260Z   File "/opt/conda/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 75, in __call__
clara-train-sdk_1  | 2022-09-13T22:03:51.684239461Z     return await self.app(scope, receive, send)
clara-train-sdk_1  | 2022-09-13T22:03:51.684241487Z   File "/opt/conda/lib/python3.8/site-packages/fastapi/applications.py", line 212, in __call__
clara-train-sdk_1  | 2022-09-13T22:03:51.684243637Z     await super().__call__(scope, receive, send)
clara-train-sdk_1  | 2022-09-13T22:03:51.684245632Z   File "/opt/conda/lib/python3.8/site-packages/starlette/applications.py", line 112, in __call__
clara-train-sdk_1  | 2022-09-13T22:03:51.684247732Z     await self.middleware_stack(scope, receive, send)
clara-train-sdk_1  | 2022-09-13T22:03:51.684249722Z   File "/opt/conda/lib/python3.8/site-packages/starlette/middleware/errors.py", line 181, in __call__
clara-train-sdk_1  | 2022-09-13T22:03:51.684251807Z     raise exc
clara-train-sdk_1  | 2022-09-13T22:03:51.684253806Z   File "/opt/conda/lib/python3.8/site-packages/starlette/middleware/errors.py", line 159, in __call__
clara-train-sdk_1  | 2022-09-13T22:03:51.684256232Z     await self.app(scope, receive, _send)
clara-train-sdk_1  | 2022-09-13T22:03:51.684259502Z   File "/opt/conda/lib/python3.8/site-packages/starlette/middleware/cors.py", line 84, in __call__
clara-train-sdk_1  | 2022-09-13T22:03:51.684262876Z     await self.app(scope, receive, send)
clara-train-sdk_1  | 2022-09-13T22:03:51.684265188Z   File "/opt/conda/lib/python3.8/site-packages/starlette/exceptions.py", line 82, in __call__
clara-train-sdk_1  | 2022-09-13T22:03:51.684267245Z     raise exc
clara-train-sdk_1  | 2022-09-13T22:03:51.684269190Z   File "/opt/conda/lib/python3.8/site-packages/starlette/exceptions.py", line 71, in __call__
clara-train-sdk_1  | 2022-09-13T22:03:51.684277672Z     await self.app(scope, receive, sender)
clara-train-sdk_1  | 2022-09-13T22:03:51.684279688Z   File "/opt/conda/lib/python3.8/site-packages/starlette/routing.py", line 656, in __call__
clara-train-sdk_1  | 2022-09-13T22:03:51.684281521Z     await route.handle(scope, receive, send)
clara-train-sdk_1  | 2022-09-13T22:03:51.684283238Z   File "/opt/conda/lib/python3.8/site-packages/starlette/routing.py", line 259, in handle
clara-train-sdk_1  | 2022-09-13T22:03:51.684285010Z     await self.app(scope, receive, send)
clara-train-sdk_1  | 2022-09-13T22:03:51.684286714Z   File "/opt/conda/lib/python3.8/site-packages/starlette/routing.py", line 61, in app
clara-train-sdk_1  | 2022-09-13T22:03:51.684288497Z     response = await func(request)
clara-train-sdk_1  | 2022-09-13T22:03:51.684290176Z   File "/opt/conda/lib/python3.8/site-packages/fastapi/routing.py", line 226, in app
clara-train-sdk_1  | 2022-09-13T22:03:51.684292004Z     raw_response = await run_endpoint_function(
clara-train-sdk_1  | 2022-09-13T22:03:51.684293720Z   File "/opt/conda/lib/python3.8/site-packages/fastapi/routing.py", line 161, in run_endpoint_function
clara-train-sdk_1  | 2022-09-13T22:03:51.684296167Z     return await run_in_threadpool(dependant.call, **values)
clara-train-sdk_1  | 2022-09-13T22:03:51.684297949Z   File "/opt/conda/lib/python3.8/site-packages/starlette/concurrency.py", line 39, in run_in_threadpool
clara-train-sdk_1  | 2022-09-13T22:03:51.684299822Z     return await anyio.to_thread.run_sync(func, *args)
clara-train-sdk_1  | 2022-09-13T22:03:51.684301544Z   File "/opt/conda/lib/python3.8/site-packages/anyio/to_thread.py", line 28, in run_sync
clara-train-sdk_1  | 2022-09-13T22:03:51.684303333Z     return await get_asynclib().run_sync_in_worker_thread(func, *args, cancellable=cancellable,
clara-train-sdk_1  | 2022-09-13T22:03:51.684305107Z   File "/opt/conda/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 818, in run_sync_in_worker_thread
clara-train-sdk_1  | 2022-09-13T22:03:51.684306960Z     return await future
clara-train-sdk_1  | 2022-09-13T22:03:51.684308678Z   File "/opt/conda/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 754, in run
clara-train-sdk_1  | 2022-09-13T22:03:51.684310470Z     result = context.run(func, *args)
clara-train-sdk_1  | 2022-09-13T22:03:51.684312163Z   File "api/api_admin.py", line 189, in admin_model_load
clara-train-sdk_1  | 2022-09-13T22:03:51.684313939Z   File "/opt/conda/lib/python3.8/json/__init__.py", line 357, in loads
clara-train-sdk_1  | 2022-09-13T22:03:51.684315706Z     return _default_decoder.decode(s)
clara-train-sdk_1  | 2022-09-13T22:03:51.684317401Z   File "/opt/conda/lib/python3.8/json/decoder.py", line 337, in decode
clara-train-sdk_1  | 2022-09-13T22:03:51.684319153Z     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
clara-train-sdk_1  | 2022-09-13T22:03:51.684320876Z   File "/opt/conda/lib/python3.8/json/decoder.py", line 353, in raw_decode
clara-train-sdk_1  | 2022-09-13T22:03:51.684322644Z     obj, end = self.scan_once(s, idx)
clara-train-sdk_1  | 2022-09-13T22:03:51.684324340Z json.decoder.JSONDecodeError: Expecting ',' delimiter: line 1 column 67 (char 66)
clara-train-sdk_1  | 2022-09-13T22:05:38.660073234Z [2022-09-13 22:05:38,659] [80] [AnyIO worker thread] [INFO] (aiaa.api.api_admin:191) - Download MMAR:: clara_pt_deepgrow_3d_annotation_4.1

Error log when upload model with MONAIlabel wed API inferface:

clara-train-sdk_1  | 2022-09-13T22:05:40.567866711Z [2022-09-13 22:05:40,565] [80]
[MainThread] [ERROR] (uvicorn.error:375) - Exception in ASGI application
clara-train-sdk_1  | 2022-09-13T22:05:40.567912292Z Traceback (most recent call last):
clara-train-sdk_1  | 2022-09-13T22:05:40.567924574Z   File "/opt/conda/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py", line 372, in run_asgi
clara-train-sdk_1  | 2022-09-13T22:05:40.567935362Z     result = await app(self.scope, self.receive, self.send)
clara-train-sdk_1  | 2022-09-13T22:05:40.567944996Z   File "/opt/conda/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 75, in __call__
clara-train-sdk_1  | 2022-09-13T22:05:40.567955246Z     return await self.app(scope, receive, send)
clara-train-sdk_1  | 2022-09-13T22:05:40.567964871Z   File "/opt/conda/lib/python3.8/site-packages/fastapi/applications.py", line 212, in __call__
clara-train-sdk_1  | 2022-09-13T22:05:40.567974852Z     await super().__call__(scope, receive, send)
clara-train-sdk_1  | 2022-09-13T22:05:40.567984093Z   File "/opt/conda/lib/python3.8/site-packages/starlette/applications.py", line 112, in __call__
clara-train-sdk_1  | 2022-09-13T22:05:40.567993687Z     await self.middleware_stack(scope, receive, send)
clara-train-sdk_1  | 2022-09-13T22:05:40.568002872Z   File "/opt/conda/lib/python3.8/site-packages/starlette/middleware/errors.py", line 181, in __call__
clara-train-sdk_1  | 2022-09-13T22:05:40.568012681Z     raise exc
clara-train-sdk_1  | 2022-09-13T22:05:40.568021791Z   File "/opt/conda/lib/python3.8/site-packages/starlette/middleware/errors.py", line 159, in __call__
clara-train-sdk_1  | 2022-09-13T22:05:40.568031247Z     await self.app(scope, receive, _send)
clara-train-sdk_1  | 2022-09-13T22:05:40.568040268Z   File "/opt/conda/lib/python3.8/site-packages/starlette/middleware/cors.py", line 92, in __call__
clara-train-sdk_1  | 2022-09-13T22:05:40.568049939Z     await self.simple_response(scope, receive, send, request_headers=headers)
clara-train-sdk_1  | 2022-09-13T22:05:40.568059417Z   File "/opt/conda/lib/python3.8/site-packages/starlette/middleware/cors.py", line 147, in simple_response
clara-train-sdk_1  | 2022-09-13T22:05:40.568069297Z     await self.app(scope, receive, send)
clara-train-sdk_1  | 2022-09-13T22:05:40.568078391Z   File "/opt/conda/lib/python3.8/site-packages/starlette/exceptions.py", line 82, in __call__
clara-train-sdk_1  | 2022-09-13T22:05:40.568088040Z     raise exc
clara-train-sdk_1  | 2022-09-13T22:05:40.568097131Z   File "/opt/conda/lib/python3.8/site-packages/starlette/exceptions.py", line 71, in __call__
clara-train-sdk_1  | 2022-09-13T22:05:40.568106618Z     await self.app(scope, receive, sender)
clara-train-sdk_1  | 2022-09-13T22:05:40.568115761Z   File "/opt/conda/lib/python3.8/site-packages/starlette/routing.py", line 656, in __call__
clara-train-sdk_1  | 2022-09-13T22:05:40.568125496Z     await route.handle(scope, receive, send)
clara-train-sdk_1  | 2022-09-13T22:05:40.568134605Z   File "/opt/conda/lib/python3.8/site-packages/starlette/routing.py", line 259, in handle
clara-train-sdk_1  | 2022-09-13T22:05:40.568144163Z     await self.app(scope, receive, send)
clara-train-sdk_1  | 2022-09-13T22:05:40.568153139Z   File "/opt/conda/lib/python3.8/site-packages/starlette/routing.py", line 61, in app
clara-train-sdk_1  | 2022-09-13T22:05:40.568162608Z     response = await func(request)
clara-train-sdk_1  | 2022-09-13T22:05:40.568173574Z   File "/opt/conda/lib/python3.8/site-packages/fastapi/routing.py", line 226, in app
clara-train-sdk_1  | 2022-09-13T22:05:40.568207750Z     raw_response = await run_endpoint_function(
clara-train-sdk_1  | 2022-09-13T22:05:40.568218753Z   File "/opt/conda/lib/python3.8/site-packages/fastapi/routing.py", line 161, in run_endpoint_function
clara-train-sdk_1  | 2022-09-13T22:05:40.568228648Z     return await run_in_threadpool(dependant.call, **values)
clara-train-sdk_1  | 2022-09-13T22:05:40.568237895Z   File "/opt/conda/lib/python3.8/site-packages/starlette/concurrency.py", line 39, in run_in_threadpool
clara-train-sdk_1  | 2022-09-13T22:05:40.568247881Z     return await anyio.to_thread.run_sync(func, *args)
clara-train-sdk_1  | 2022-09-13T22:05:40.568256911Z   File "/opt/conda/lib/python3.8/site-packages/anyio/to_thread.py", line 28, in run_sync
clara-train-sdk_1  | 2022-09-13T22:05:40.568266394Z     return await get_asynclib().run_sync_in_worker_thread(func, *args, cancellable=cancellable,
clara-train-sdk_1  | 2022-09-13T22:05:40.568276058Z   File "/opt/conda/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 818, in run_sync_in_worker_thread
clara-train-sdk_1  | 2022-09-13T22:05:40.568285928Z     return await future
clara-train-sdk_1  | 2022-09-13T22:05:40.568294884Z   File "/opt/conda/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 754, in run
clara-train-sdk_1  | 2022-09-13T22:05:40.568304335Z     result = context.run(func, *args)
clara-train-sdk_1  | 2022-09-13T22:05:40.568313558Z   File "api/api_admin.py", line 196, in admin_model_load
clara-train-sdk_1  | 2022-09-13T22:05:40.568323688Z   File "/opt/monai/monai/apps/mmars/mmars.py", line 141, in download_mmar
clara-train-sdk_1  | 2022-09-13T22:05:40.568333379Z     raise ValueError(f"api query returns no item for pattern {item}.  Please change or shorten it.")
clara-train-sdk_1  | 2022-09-13T22:05:40.568343412Z ValueError: api query returns no item for pattern clara_pt_deepgrow_3d_annotation_4.1.  Please change or shorten it.

Hi
It seems like nvidia-smi doesn’t work from with in your docker. Can you test running the docker then doing nvidia-smi
If it works then try running clara container without triton and start AIAA with

AIAA start -w /claraDevDay/AIAA/workspace/ --engine AIAA