I have been struggling to perform inference with large models like Llama-70b or Qwen-70b. What are the necessary requirements to perform inference with these two models using TensorRT LLM and Triton?
Changed the sub-category to TensorRT
Request you to refer the link below.