Alphafold2 bug

Hello,

There seems to be an issue with the alphafold2 container when passing the NIM_PARALLEL_MSA_RUNNERS and NIM_PARALLEL_THREADS_PER_MSA env variables.

The following docker command results in exception when calling the following get operation:

curl -X 'POST' \
    -i \
    "http://localhost:8000/protein-structure/alphafold2/predict-structure-from-sequence"  \
    -H 'accept: application/json' \
    -H 'Content-Type: application/json' \
    -d '{"sequence": "MVLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRVKHLKTEAEMKASEDLKKHGVTVLTALGAILKKKGHHEAELKPLAQSHATKHKIPIKYLEFISEAIIHVLHSRHPGNFGADAQGAMNKALELFRKDIAAKYKELGYQG", "databases": ["uniref90", "mgnify", "small_bfd"]}' > output.json
docker run --rm --name alphafold2 --runtime=nvidia     -e NGC_CLI_API_KEY -e NIM_PARALLEL_MSA_RUNNERS=5 -e NIM_PARALLEL_THREADS_PER_MSA=4     -v $LOCAL_NIM_CACHE:/opt/nim/.cache     -p 8000:8000     nvcr.io/nim/deepmind
/alphafold2:1.0.0

Exception:

INFO:     172.17.0.1:45376 - "POST /protein-structure/alphafold2/predict-structure-from-sequence HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
  + Exception Group Traceback (most recent call last):
  |   File "/usr/local/lib/python3.10/dist-packages/starlette/_utils.py", line 87, in collapse_excgroups
  |     yield
  |   File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/base.py", line 190, in __call__
  |     async with anyio.create_task_group() as task_group:
  |   File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 680, in __aexit__
  |     raise BaseExceptionGroup(
  | exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/h11_impl.py", line 407, in run_asgi
    |     result = await app(  # type: ignore[func-returns-value]
    |   File "/usr/local/lib/python3.10/dist-packages/uvicorn/middleware/proxy_headers.py", line 69, in __call__
    |     return await self.app(scope, receive, send)
    |   File "/usr/local/lib/python3.10/dist-packages/fastapi/applications.py", line 1054, in __call__
    |     await super().__call__(scope, receive, send)
    |   File "/usr/local/lib/python3.10/dist-packages/starlette/applications.py", line 123, in __call__
    |     await self.middleware_stack(scope, receive, send)
    |   File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 186, in __call__
    |     raise exc
    |   File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 164, in __call__
    |     await self.app(scope, receive, _send)
    |   File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/base.py", line 189, in __call__
    |     with collapse_excgroups():
    |   File "/usr/lib/python3.10/contextlib.py", line 153, in __exit__
    |     self.gen.throw(typ, value, traceback)
    |   File "/usr/local/lib/python3.10/dist-packages/starlette/_utils.py", line 93, in collapse_excgroups
    |     raise exc
    |   File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/base.py", line 191, in __call__
    |     response = await self.dispatch_func(request, call_next)
    |   File "/usr/local/lib/python3.10/dist-packages/nimlib/nim_inference_api_builder/api.py", line 113, in metrics_middleware
    |     response = await call_next(request)
    |   File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/base.py", line 165, in call_next
    |     raise app_exc
    |   File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/base.py", line 151, in coro
    |     await self.app(scope, receive_or_disconnect, send_no_error)
    |   File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/exceptions.py", line 65, in __call__
    |     await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
    |   File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    |     raise exc
    |   File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    |     await app(scope, receive, sender)
    |   File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 756, in __call__
    |     await self.middleware_stack(scope, receive, send)
    |   File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 776, in app
    |     await route.handle(scope, receive, send)
    |   File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 297, in handle
    |     await self.app(scope, receive, send)
    |   File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 77, in app
    |     await wrap_app_handling_exceptions(app, request)(scope, receive, send)
    |   File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    |     raise exc
    |   File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    |     await app(scope, receive, sender)
    |   File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 72, in app
    |     response = await func(request)
    |   File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 278, in app
    |     raw_response = await run_endpoint_function(
    |   File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 193, in run_endpoint_function
    |     return await run_in_threadpool(dependant.call, **values)
    |   File "/usr/local/lib/python3.10/dist-packages/starlette/concurrency.py", line 42, in run_in_threadpool
    |     return await anyio.to_thread.run_sync(func, *args)
    |   File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 56, in run_sync
    |     return await get_async_backend().run_sync_in_worker_thread(
    |   File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread
    |     return await future
    |   File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 859, in run
    |     result = context.run(func, *args)
    |   File "/opt/inference.py", line 409, in nim_api_post_call_protein_structure_alphafold2_predict_structure_from_sequence_post
    |     alignments_and_templates = self.nim_api_post_call_protein_structure_alphafold2_predict_msa_from_sequence_post(msa_request_body)
    |   File "/opt/inference.py", line 224, in nim_api_post_call_protein_structure_alphafold2_predict_msa_from_sequence_post
    |     dbname_to_alignments = create_alignments(body.sequence,
    |   File "/opt/alphafold2_inference_wrappers.py", line 178, in create_alignments
    |     queue = AsyncWorkQueue(max_workers=n_workers, use_processes=False)
    |   File "/usr/local/lib/python3.10/dist-packages/bia/concurrent/TaskQueue.py", line 17, in __init__
    |     self.executor = concurrent.futures.ThreadPoolExecutor(max_workers=max_workers)
    |   File "/usr/lib/python3.10/concurrent/futures/thread.py", line 143, in __init__
    |     if max_workers <= 0:
    | TypeError: '<=' not supported between instances of 'str' and 'int'
    +------------------------------------

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/h11_impl.py", line 407, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/usr/local/lib/python3.10/dist-packages/uvicorn/middleware/proxy_headers.py", line 69, in __call__
    return await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/applications.py", line 123, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 186, in __call__
    raise exc
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 164, in __call__
    await self.app(scope, receive, _send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/base.py", line 189, in __call__
    with collapse_excgroups():
  File "/usr/lib/python3.10/contextlib.py", line 153, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/usr/local/lib/python3.10/dist-packages/starlette/_utils.py", line 93, in collapse_excgroups
    raise exc
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/base.py", line 191, in __call__
    response = await self.dispatch_func(request, call_next)
  File "/usr/local/lib/python3.10/dist-packages/nimlib/nim_inference_api_builder/api.py", line 113, in metrics_middleware
    response = await call_next(request)
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/base.py", line 165, in call_next
    raise app_exc
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/base.py", line 151, in coro
    await self.app(scope, receive_or_disconnect, send_no_error)
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/exceptions.py", line 65, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 756, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 776, in app
    await route.handle(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 297, in handle
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 77, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 72, in app
    response = await func(request)
  File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 278, in app
    raw_response = await run_endpoint_function(
  File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 193, in run_endpoint_function
    return await run_in_threadpool(dependant.call, **values)
  File "/usr/local/lib/python3.10/dist-packages/starlette/concurrency.py", line 42, in run_in_threadpool
    return await anyio.to_thread.run_sync(func, *args)
  File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread
    return await future
  File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 859, in run
    result = context.run(func, *args)
  File "/opt/inference.py", line 409, in nim_api_post_call_protein_structure_alphafold2_predict_structure_from_sequence_post
    alignments_and_templates = self.nim_api_post_call_protein_structure_alphafold2_predict_msa_from_sequence_post(msa_request_body)
  File "/opt/inference.py", line 224, in nim_api_post_call_protein_structure_alphafold2_predict_msa_from_sequence_post
    dbname_to_alignments = create_alignments(body.sequence,
  File "/opt/alphafold2_inference_wrappers.py", line 178, in create_alignments
    queue = AsyncWorkQueue(max_workers=n_workers, use_processes=False)
  File "/usr/local/lib/python3.10/dist-packages/bia/concurrent/TaskQueue.py", line 17, in __init__
    self.executor = concurrent.futures.ThreadPoolExecutor(max_workers=max_workers)
  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 143, in __init__
    if max_workers <= 0:
TypeError: '<=' not supported between instances of 'str' and 'int'

It works fine if the env variables are removed.

Let me know if I’m doing something not right.

Thanks,
Timur

Hi Timur,

Thanks for your interest in BioNeMo and the AlphaFold2 NIM!

I am able to reproduce this behavior, and we’re working on a fix. I will let you know when resolved.

Thanks,
Kris