Better GPU for training & Inference & Execution LLModels


Dear Community,

I’m quite new in this Pre-Trained DL Transformers worlds and Im looking for the proper GPU for my Workstations to train and use LLModels for various purposes.

Im not expert in hardware and basically I need your help for choosing the proper GPU for my needs. I want to train and use open source models like these:

  • LLama ccp- Vicuna 13B
  • LLama 2 70B
  • Zephyr 7B
  • Mistral 7B
  • Claudes
  • GPTs
  • Wizard

But I don’t know which criteria apply to choose the propse GPU with the proper ratio performance / prize.

Simple questions:

  • Nvidia Titan, Quadro, Tesla, GeForce?
  • 8GB, 16GB, 24GB or more (I would like to even train models with 70B)
  • Which parameters I need to take caregfully into account?

Im quite lost yet here, and I would appreciate someone expert in this to shed light on these shadows.

Thanks so much in advance


TensorRT Version:
GPU Type:
Nvidia Driver Version:
CUDA Version:
CUDNN Version:
Operating System + Version:
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered


Request you to share the model, script, profiler, and performance output if not shared already so that we can help you better.

Alternatively, you can try running your model with trtexec command.

While measuring the model performance, make sure you consider the latency and throughput of the network inference, excluding the data pre and post-processing overhead.
Please refer to the below links for more details: