Part 5 Assessment in DLI Course ‘Building RAG Agents for LLMs’

Hi
I have gone through all the steps in the course but stuck in Part 5 of 08_evaluation.ipynb, as the final step of this course. I’m not sure what exactly should I do, maybe someone could help on this topic?

First, I’m pretty sure the following conditions are met:

Second, I go back to 01_microservices.ipynb to launch the Gradio interface on port 8090, however, I get error saying Gradio Stream failed: [Errno 111] Connection refused when clicking “Evaluate” button, it looks like the image below

Third, I go back to DLI web site to click “Assess Task” and still get an error saying it does not look like you completed the assessment yet...

Although the course mentioned that we could leverage 35_langserve.ipynb as an example, but I don’t really get how to make it work, should I initiate LangChain Server on the other port rather than 9012?

Hey @lzlallen1980

Looking over at the frontend, you’ll see that the RemoteRunnables are constructed to listen to the :9012 port like so:

## Necessary Endpoints
chains_dict = {
    'basic' : RemoteRunnable("http://lab:9012/basic_chat/"),
    'retriever' : RemoteRunnable("http://lab:9012/retriever/"),
    'generator' : RemoteRunnable("http://lab:9012/generator/"),
}

Are you adding those three routes in the same fashion as the example:

add_routes(
    app,
    llm,
    path="/basic_chat",
)

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=9012)

If not, then that might be the problem. If yes, then can you tell me a bit more about how you’re approaching it?

Hi @vkudlay
Here is my current approach which is still not working…

%%writefile server_app.py
# https://python.langchain.com/docs/langserve#server
from fastapi import FastAPI
from langchain.prompts import ChatPromptTemplate
from langchain_nvidia_ai_endpoints import ChatNVIDIA
from langserve import add_routes

llm = ChatNVIDIA(model="mixtral_8x7b")

app = FastAPI(
  title="LangChain Server",
  version="1.0",
  description="A simple api server using Langchain's Runnable interfaces",
)

add_routes(
    app,
    llm,
    path="/basic_chat",
)

## BEGIN TODO
app2 = FastAPI(
  title="retriever",
  version="1.0",
  description="retriever",
)

add_routes(
    app2,
    llm,
    path="/retriever",
)

app3 = FastAPI(
  title="generator",
  version="1.0",
  description="generator",
)

add_routes(
    app3,
    llm,
    path="/generator",
)


## END TODO

## Might be encountered if this were for a standalone python file...
if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="http://lab", port=9012)
    uvicorn.run(app2, host="http://lab", port=9012)
    uvicorn.run(app3, host="http://lab", port=9012)

Hey @lzlallen1980

Oh! That’s a reasonable guess as to how that works! (and as far as the example is concerned, this makes sense). That’s not exactly how add_routes integrates with FastAPI components though; the expectation is that you construct and launch a single app with multiple endpoints associated with the service. I’m assuming that your code probably launches uvicorn.run(app, host="http://lab", port=9012) and then proceeds to wait until you close it before going through the second and third app deployment.

The main README of the LangServe github repo shows a pretty good example of how you can bind multiple routes to a single application: langchain-ai/langserve: LangServe 🦜️🏓 (github.com)

Hope this helps! :D

Hi @vkudlay
Thanks for your useful recommendation, after going through LangServe GitHub repo, I managed to add multiple routes on a single application on http://0.0.0.0:9012, and ensure they are working properly by using simple client code to test out, see the following screenshot.

However, by hosting the application on http://0.0.0.0:9012, I’m not able to go through the Evaluate process on Gradio frontend, here is the error

I guess there is something wrong with the hosting address, so I also tried out different addresses, such as
uvicorn.run(app, host="http://dli-be911a886c75-84f58c.aws.labs.courses.nvidia.com", port=9012) and uvicorn.run(app, host="http://lab", port=9012), but I get the following error during the server launch process

[Error -2] Name or service not known

Do you see anything that I could adjust further?

Oh interesting. Have you tried querying the logs endpoint from notebook 1? In reference to this utility endpoint in the docker_router spec:

@app.get("/containers/{container_name}/logs")
async def get_container_logs(container_name: str):
    """Route that allows you to query the log file of the container"""
    try:
        container = client.containers.get(container_name)
        logs = container.logs()
        return {"logs": logs.decode('utf-8')}
    except NotFound:
        return {"error": f"Container `{container_name}` not found"}

By the way, your rag chain example is actually using the llm from earlier. Worth checking to make sure it does function locally as expected.

rag_chain = ...
for token in llm.stream()...

Hi @vkudlay
You’re right, something wrong with the client code when invoking route /generator.

This time, I checked the sample code from LangServe Github repo and change it to the following method.

from langchain_core.prompts import ChatPromptTemplate

# Invoke generator route --> http://0.0.0.0:9012/generator
RemoteRunnable_generator = RemoteRunnable("http://0.0.0.0:9012/generator/")
RemoteRunnable_generator.invoke({"input": "Tell me something interesting"})

and the corresponding server code is using rag_chain which is taught in this course. The server can generate the response with pprint(rag_chain.invoke("Tell me something interesting!")) but is failed to take client’s request.


chat_prompt = ChatPromptTemplate.from_messages([("system",
    "You are a document chatbot. Help the user as they ask questions about documents."
    " User messaged just asked you a question: {input}\n\n"
    " The following information may be useful for your response: "
    " Document Retrieval:\n{context}\n\n"
    " (Answer only from retrieval. Only cite sources that are used. Make your response conversational)"
), ('user', '{input}')])

long_reorder = RunnableLambda(LongContextReorder().transform_documents)  ## GIVEN
context_getter = itemgetter('input') | docstore.as_retriever() | long_reorder | docs2str
retrieval_chain = {'input' : (lambda x: x)} | RunnableAssign({'context' : context_getter})

generator_chain = RunnableAssign({"output" : chat_prompt | llm_RAG })
generator_chain = generator_chain | output_puller
rag_chain = retrieval_chain | generator_chain

# Self Test
#pprint(rag_chain.invoke("Tell me something interesting!"))

add_routes(
    app,
    rag_chain,
    path="/generator",
)

and here is the server log, does this mean the input sent from client is in the wrong format?

INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:9012 (Press CTRL+C to quit)
INFO:     127.0.0.1:56106 - "POST /basic_chat/stream HTTP/1.1" 200 OK
INFO:     127.0.0.1:56118 - "POST /retriever/invoke HTTP/1.1" 200 OK
INFO:     127.0.0.1:39558 - "POST /generator/invoke HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/langchain_nvidia_ai_endpoints/_common.py", line 203, in _try_raise
    response.raise_for_status()
  File "/usr/local/lib/python3.11/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 422 Client Error: Unprocessable Entity for url: https://api.nvcf.nvidia.com/v2/nvcf/pexec/functions/091a03bb-7364-4087-8090-bd71e9277520

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 426, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__
    return await self.app(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/applications.py", line 116, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 186, in __call__
    raise exc
  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in __call__
    await self.app(scope, receive, _send)
  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 55, in wrapped_app
    raise exc
  File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 44, in wrapped_app
    await app(scope, receive, sender)
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 746, in __call__
    await route.handle(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 288, in handle
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 75, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 55, in wrapped_app
    raise exc
  File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 44, in wrapped_app
    await app(scope, receive, sender)
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 70, in app
    response = await func(request)
               ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 299, in app
    raise e
  File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 294, in app
    raw_response = await run_endpoint_function(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
    return await dependant.call(**values)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langserve/server.py", line 442, in invoke
    return await api_handler.invoke(request)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langserve/api_handler.py", line 672, in invoke
    output = await self._runnable.ainvoke(input_, config=config)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain_core/runnables/base.py", line 2087, in ainvoke
    input = await step.ainvoke(
            ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain_core/runnables/passthrough.py", line 443, in ainvoke
    return await self._acall_with_config(self._ainvoke, input, config, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain_core/runnables/base.py", line 1295, in _acall_with_config
    output: Output = await asyncio.create_task(coro, context=context)  # type: ignore
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain_core/runnables/passthrough.py", line 430, in _ainvoke
    **await self.mapper.ainvoke(
      ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain_core/runnables/base.py", line 2645, in ainvoke
    results = await asyncio.gather(
              ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain_core/runnables/base.py", line 2087, in ainvoke
    input = await step.ainvoke(
            ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain_core/retrievers.py", line 136, in ainvoke
    return await self.aget_relevant_documents(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain_core/retrievers.py", line 281, in aget_relevant_documents
    raise e
  File "/usr/local/lib/python3.11/site-packages/langchain_core/retrievers.py", line 274, in aget_relevant_documents
    result = await self._aget_relevant_documents(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain_core/vectorstores.py", line 674, in _aget_relevant_documents
    docs = await self.vectorstore.asimilarity_search(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain_community/vectorstores/faiss.py", line 536, in asimilarity_search
    docs_and_scores = await self.asimilarity_search_with_score(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain_community/vectorstores/faiss.py", line 423, in asimilarity_search_with_score
    embedding = await self._aembed_query(query)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain_community/vectorstores/faiss.py", line 161, in _aembed_query
    return await self.embedding_function.aembed_query(text)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain_core/embeddings.py", line 24, in aembed_query
    return await run_in_executor(None, self.embed_query, text)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain_core/runnables/config.py", line 493, in run_in_executor
    return await asyncio.get_running_loop().run_in_executor(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain_nvidia_ai_endpoints/embeddings.py", line 56, in embed_query
    return self._embed([text], model_type=self.model_type or "query")[0]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain_nvidia_ai_endpoints/embeddings.py", line 38, in _embed
    response = self.client.get_req(
               ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain_nvidia_ai_endpoints/_common.py", line 288, in get_req
    response, session = self._post(invoke_url, payload)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain_nvidia_ai_endpoints/_common.py", line 165, in _post
    self._try_raise(response)
  File "/usr/local/lib/python3.11/site-packages/langchain_nvidia_ai_endpoints/_common.py", line 218, in _try_raise
    raise Exception(f"{title}\n{body}") from e
Exception: [422] Unprocessable Entity
body -> input -> str
  Input should be a valid string (type=string_type)
body -> input -> list[str] -> 0
  Input should be a valid string (type=string_type)

Oh interesting! A few observations:

Your log indicates that the issue is actually coming out of the document retrieval model.

 File "/usr/local/lib/python3.11/site-packages/langchain_nvidia_ai_endpoints/embeddings.py", line 56, in embed_query
    return self._embed([text], model_type=self.model_type or "query")[0]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain_nvidia_ai_endpoints/embeddings.py", line 38, in _embed
    response = self.client.get_req(
               ^^^^^^^^^^^^^^^^^^^^

Specifically, it’s yelling that the input to it is neither a string nor a list of strings. If you investigate the server_app.py, you’d figure out a potential issue. If you check over 8_evaluate.ipynb, there are some boilerplate starter codes that hint about how these two components should be specified and passed in:

rag_chain = retrieval_chain | generator_chain

You may notice that you’re passing in the RAG chain as generator, and not the generator chain :D

PS: The reason we accept both for the server may not be obvious. It’s because there are some utilities that may need the retriever component in isolation (i.e. vectorstore recreation), so we figured it would be better to pass them in separately.

@vkudlay
Ha, I see, thanks for your patience with me.
Now I can use the Frontend to pass the assessment as the image below,

and I also pass the assessment on DLI website, thanks for your swift feedback and wish you a wonderful day.

2 Likes

Fantastic!! Great job, and hope you liked the course 😄

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.