Hello experts,
do you know what’s this error message for “Error processing query: LLM Call Exception: [500] Internal Server Error
Error during inference of request chat-9a5d67e40872459da1c6533785e6fbf0 – Encountered an error in forwardAsync function: [TensorRT-LLM][ERROR] CUDA runtime error in cudaMemcpy2DAsync( dstPtr, copyPitch, srcPtr, copyPitch, copyWidth, copyHeight, cudaMemcpyHostToDevice, cudaStream.get()): unknown error (/home/jenkins/agent/workspace/LLM/release-0.11/L0_MergeRequest/llm/cpp/tensorrt_llm/batch_manager/transformerBuffers.cpp:255)
1 0x7ff0dbe21c2e void tensorrt_llm::common::check”