Is it possible to deploy the Llama-70b model with TensorRT LLM on an L40S GPU?

laura.fernandez · May 9, 2024, 4:02pm

I have been struggling to perform inference with large models like Llama-70b or Qwen-70b. What are the necessary requirements to perform inference with these two models using TensorRT LLM and Triton?

nadeemm · May 9, 2024, 5:57pm

Changed the sub-category to TensorRT

AakankshaS · May 30, 2024, 6:56pm

Request you to refer the link below.