Part 5 Assessment in DLI Course ‘Building RAG Agents for LLMs’

ryan_sg · February 20, 2024, 11:31am

Hi
I have gone through all the steps in the course but stuck in Part 5 of 08_evaluation.ipynb, as the final step of this course. I’m not sure what exactly should I do, maybe someone could help on this topic?

First, I’m pretty sure the following conditions are met:

Second, I go back to 01_microservices.ipynb to launch the Gradio interface on port 8090, however, I get error saying Gradio Stream failed: [Errno 111] Connection refused when clicking “Evaluate” button, it looks like the image below

Third, I go back to DLI web site to click “Assess Task” and still get an error saying it does not look like you completed the assessment yet...

Although the course mentioned that we could leverage 35_langserve.ipynb as an example, but I don’t really get how to make it work, should I initiate LangChain Server on the other port rather than 9012?

vkudlay · February 20, 2024, 8:29pm

Hey @ryan_sg

Looking over at the frontend, you’ll see that the RemoteRunnables are constructed to listen to the :9012 port like so:

## Necessary Endpoints
chains_dict = {
    'basic' : RemoteRunnable("http://lab:9012/basic_chat/"),
    'retriever' : RemoteRunnable("http://lab:9012/retriever/"),
    'generator' : RemoteRunnable("http://lab:9012/generator/"),
}

Are you adding those three routes in the same fashion as the example:

add_routes(
    app,
    llm,
    path="/basic_chat",
)

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=9012)

If not, then that might be the problem. If yes, then can you tell me a bit more about how you’re approaching it?

ryan_sg · February 21, 2024, 2:52am

Hi @vkudlay
Here is my current approach which is still not working…

%%writefile server_app.py
# https://python.langchain.com/docs/langserve#server
from fastapi import FastAPI
from langchain.prompts import ChatPromptTemplate
from langchain_nvidia_ai_endpoints import ChatNVIDIA
from langserve import add_routes

llm = ChatNVIDIA(model="mixtral_8x7b")

app = FastAPI(
  title="LangChain Server",
  version="1.0",
  description="A simple api server using Langchain's Runnable interfaces",
)

add_routes(
    app,
    llm,
    path="/basic_chat",
)

## BEGIN TODO
app2 = FastAPI(
  title="retriever",
  version="1.0",
  description="retriever",
)

add_routes(
    app2,
    llm,
    path="/retriever",
)

app3 = FastAPI(
  title="generator",
  version="1.0",
  description="generator",
)

add_routes(
    app3,
    llm,
    path="/generator",
)


## END TODO

## Might be encountered if this were for a standalone python file...
if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="http://lab", port=9012)
    uvicorn.run(app2, host="http://lab", port=9012)
    uvicorn.run(app3, host="http://lab", port=9012)

vkudlay · February 21, 2024, 3:03am

Hey @ryan_sg

Oh! That’s a reasonable guess as to how that works! (and as far as the example is concerned, this makes sense). That’s not exactly how add_routes integrates with FastAPI components though; the expectation is that you construct and launch a single app with multiple endpoints associated with the service. I’m assuming that your code probably launches uvicorn.run(app, host="http://lab", port=9012) and then proceeds to wait until you close it before going through the second and third app deployment.

The main README of the LangServe github repo shows a pretty good example of how you can bind multiple routes to a single application: langchain-ai/langserve: LangServe 🦜️🏓 (github.com)

Hope this helps! :D

ryan_sg · February 21, 2024, 8:22am

Hi @vkudlay
Thanks for your useful recommendation, after going through LangServe GitHub repo, I managed to add multiple routes on a single application on http://0.0.0.0:9012, and ensure they are working properly by using simple client code to test out, see the following screenshot.

However, by hosting the application on http://0.0.0.0:9012, I’m not able to go through the Evaluate process on Gradio frontend, here is the error

I guess there is something wrong with the hosting address, so I also tried out different addresses, such as
uvicorn.run(app, host="http://dli-be911a886c75-84f58c.aws.labs.courses.nvidia.com", port=9012) and uvicorn.run(app, host="http://lab", port=9012), but I get the following error during the server launch process

[Error -2] Name or service not known

second try (host=httplab)1428×569 31.1 KB

Do you see anything that I could adjust further?

vkudlay · February 21, 2024, 8:36am

Oh interesting. Have you tried querying the logs endpoint from notebook 1? In reference to this utility endpoint in the docker_router spec:

@app.get("/containers/{container_name}/logs")
async def get_container_logs(container_name: str):
    """Route that allows you to query the log file of the container"""
    try:
        container = client.containers.get(container_name)
        logs = container.logs()
        return {"logs": logs.decode('utf-8')}
    except NotFound:
        return {"error": f"Container `{container_name}` not found"}

vkudlay · February 21, 2024, 8:43am

By the way, your rag chain example is actually using the llm from earlier. Worth checking to make sure it does function locally as expected.

rag_chain = ...
for token in llm.stream()...

ryan_sg · February 21, 2024, 10:34am

Hi @vkudlay
You’re right, something wrong with the client code when invoking route /generator.

This time, I checked the sample code from LangServe Github repo and change it to the following method.

from langchain_core.prompts import ChatPromptTemplate

# Invoke generator route --> http://0.0.0.0:9012/generator
RemoteRunnable_generator = RemoteRunnable("http://0.0.0.0:9012/generator/")
RemoteRunnable_generator.invoke({"input": "Tell me something interesting"})

and the corresponding server code is using rag_chain which is taught in this course. The server can generate the response with pprint(rag_chain.invoke("Tell me something interesting!")) but is failed to take client’s request.


chat_prompt = ChatPromptTemplate.from_messages([("system",
    "You are a document chatbot. Help the user as they ask questions about documents."
    " User messaged just asked you a question: {input}\n\n"
    " The following information may be useful for your response: "
    " Document Retrieval:\n{context}\n\n"
    " (Answer only from retrieval. Only cite sources that are used. Make your response conversational)"
), ('user', '{input}')])

long_reorder = RunnableLambda(LongContextReorder().transform_documents)  ## GIVEN
context_getter = itemgetter('input') | docstore.as_retriever() | long_reorder | docs2str
retrieval_chain = {'input' : (lambda x: x)} | RunnableAssign({'context' : context_getter})

generator_chain = RunnableAssign({"output" : chat_prompt | llm_RAG })
generator_chain = generator_chain | output_puller
rag_chain = retrieval_chain | generator_chain

# Self Test
#pprint(rag_chain.invoke("Tell me something interesting!"))

add_routes(
    app,
    rag_chain,
    path="/generator",
)

and here is the server log, does this mean the input sent from client is in the wrong format?

INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:9012 (Press CTRL+C to quit)
INFO:     127.0.0.1:56106 - "POST /basic_chat/stream HTTP/1.1" 200 OK
INFO:     127.0.0.1:56118 - "POST /retriever/invoke HTTP/1.1" 200 OK
INFO:     127.0.0.1:39558 - "POST /generator/invoke HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/langchain_nvidia_ai_endpoints/_common.py", line 203, in _try_raise
    response.raise_for_status()
  File "/usr/local/lib/python3.11/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 422 Client Error: Unprocessable Entity for url: https://api.nvcf.nvidia.com/v2/nvcf/pexec/functions/091a03bb-7364-4087-8090-bd71e9277520

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 426, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__
    return await self.app(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/applications.py", line 116, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 186, in __call__
    raise exc
  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in __call__
    await self.app(scope, receive, _send)
  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 55, in wrapped_app
    raise exc
  File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 44, in wrapped_app
    await app(scope, receive, sender)
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 746, in __call__
    await route.handle(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 288, in handle
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 75, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 55, in wrapped_app
    raise exc
  File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 44, in wrapped_app
    await app(scope, receive, sender)
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 70, in app
    response = await func(request)
               ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 299, in app
    raise e
  File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 294, in app
    raw_response = await run_endpoint_function(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
    return await dependant.call(**values)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langserve/server.py", line 442, in invoke
    return await api_handler.invoke(request)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langserve/api_handler.py", line 672, in invoke
    output = await self._runnable.ainvoke(input_, config=config)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain_core/runnables/base.py", line 2087, in ainvoke
    input = await step.ainvoke(
            ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain_core/runnables/passthrough.py", line 443, in ainvoke
    return await self._acall_with_config(self._ainvoke, input, config, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain_core/runnables/base.py", line 1295, in _acall_with_config
    output: Output = await asyncio.create_task(coro, context=context)  # type: ignore
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain_core/runnables/passthrough.py", line 430, in _ainvoke
    **await self.mapper.ainvoke(
      ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain_core/runnables/base.py", line 2645, in ainvoke
    results = await asyncio.gather(
              ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain_core/runnables/base.py", line 2087, in ainvoke
    input = await step.ainvoke(
            ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain_core/retrievers.py", line 136, in ainvoke
    return await self.aget_relevant_documents(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain_core/retrievers.py", line 281, in aget_relevant_documents
    raise e
  File "/usr/local/lib/python3.11/site-packages/langchain_core/retrievers.py", line 274, in aget_relevant_documents
    result = await self._aget_relevant_documents(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain_core/vectorstores.py", line 674, in _aget_relevant_documents
    docs = await self.vectorstore.asimilarity_search(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain_community/vectorstores/faiss.py", line 536, in asimilarity_search
    docs_and_scores = await self.asimilarity_search_with_score(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain_community/vectorstores/faiss.py", line 423, in asimilarity_search_with_score
    embedding = await self._aembed_query(query)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain_community/vectorstores/faiss.py", line 161, in _aembed_query
    return await self.embedding_function.aembed_query(text)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain_core/embeddings.py", line 24, in aembed_query
    return await run_in_executor(None, self.embed_query, text)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain_core/runnables/config.py", line 493, in run_in_executor
    return await asyncio.get_running_loop().run_in_executor(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain_nvidia_ai_endpoints/embeddings.py", line 56, in embed_query
    return self._embed([text], model_type=self.model_type or "query")[0]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain_nvidia_ai_endpoints/embeddings.py", line 38, in _embed
    response = self.client.get_req(
               ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain_nvidia_ai_endpoints/_common.py", line 288, in get_req
    response, session = self._post(invoke_url, payload)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain_nvidia_ai_endpoints/_common.py", line 165, in _post
    self._try_raise(response)
  File "/usr/local/lib/python3.11/site-packages/langchain_nvidia_ai_endpoints/_common.py", line 218, in _try_raise
    raise Exception(f"{title}\n{body}") from e
Exception: [422] Unprocessable Entity
body -> input -> str
  Input should be a valid string (type=string_type)
body -> input -> list[str] -> 0
  Input should be a valid string (type=string_type)

vkudlay · February 21, 2024, 12:52pm

Oh interesting! A few observations:

Your log indicates that the issue is actually coming out of the document retrieval model.

 File "/usr/local/lib/python3.11/site-packages/langchain_nvidia_ai_endpoints/embeddings.py", line 56, in embed_query
    return self._embed([text], model_type=self.model_type or "query")[0]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain_nvidia_ai_endpoints/embeddings.py", line 38, in _embed
    response = self.client.get_req(
               ^^^^^^^^^^^^^^^^^^^^

Specifically, it’s yelling that the input to it is neither a string nor a list of strings. If you investigate the server_app.py, you’d figure out a potential issue. If you check over 8_evaluate.ipynb, there are some boilerplate starter codes that hint about how these two components should be specified and passed in:

rag_chain = retrieval_chain | generator_chain

You may notice that you’re passing in the RAG chain as generator, and not the generator chain :D

PS: The reason we accept both for the server may not be obvious. It’s because there are some utilities that may need the retriever component in isolation (i.e. vectorstore recreation), so we figured it would be better to pass them in separately.

ryan_sg · February 21, 2024, 2:46pm

@vkudlay
Ha, I see, thanks for your patience with me.
Now I can use the Frontend to pass the assessment as the image below,

and I also pass the assessment on DLI website, thanks for your swift feedback and wish you a wonderful day.

vkudlay · February 21, 2024, 3:19pm

Fantastic!! Great job, and hope you liked the course 😄

system · March 6, 2024, 3:20pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
DLI Course "Building RAG Agents for LLMs" - Help With Assessment NVIDIA Nemotron dli	4	1572	September 5, 2024
DLI Course 'Building RAG Agents for LLMs' - Assessment Support Storage dli	36	3871	May 23, 2025
Dont understand how to finish - DLI Course ‘Building RAG Agents for LLMs’ Base Command Manager	38	2131	May 3, 2025
DLI Course ‘Building RAG Agents for LLMs’ - Assessment Support NVIDIA Nemotron dli	2	340	June 13, 2024
DLI - Building RAG Agents with LLMs NVIDIA Nemotron	28	1541	October 24, 2025
DLI Course "Building RAG Agents for LLM" Need Help With RAG branch for Assessment NVIDIA Nemotron dli , generative_ai	3	903	July 5, 2024
Langserve problem in Assessment, "Building RAG agents with LLMs" Forum Feedback dli , llama3-8b-instruct , nv-embed-v1	4	451	September 8, 2025
Building rag agents with LLMs final assessment support NVIDIA Nemotron mixtral-8x22b-instruct-v01 , mixtral-8x7b-instruct-v01 , embed-qa-4	11	861	August 11, 2024
Building RAG Agents With LLMs Langserve Server problem Forum Feedback dli	5	251	October 29, 2024
Building RAG Agents with LLMs NVIDIA Nemotron	8	890	July 23, 2024

Part 5 Assessment in DLI Course ‘Building RAG Agents for LLMs’

Related topics