The suggested processor platform Intel Xeon w5-3425 looks reasonable (112 PCIe5 lanes, 8-channel DDR5), and selecting a part from the Xeon-W line is a good idea if one wants to stick with Intel. CPU base frequency is reasonably high, providing good single-thread performance. A rule of thumb for a well-balanced system is to provide four physical CPU cores per GPU, but this obviously also depends on the application; I cannot speak for CFD in particular.
A common mistake in configuring GPU-accelerated systems is to provide too little system memory. The rule of thumb there is to size system memory to 2x to 4x of total GPU memory. You would want to populate all 8 memory channels with the highest speed grade the system supports.
As has been pointed out as a caveat, the FP64 throughput of the RTX A6000 is 1/64 of the FP32 throughput (about 0.6 FP64 TFLOPS), so you would want to make double sure that this will not be a bottleneck in your CFD computations. Techniques like using pairs of floats for quasi-double precision computation are possible but may not be practical, as I am not aware of a well-supported, fully-featured math library that one could use with this approach. You might want to also look into keeping FP64 computation on the CPU if feasible.
The 5860 often has the W3-3425 (with 64 PCIe lanes instead of 112) - probably you mistyped due to the similarity of the CPU names? The less PCIe lanes of the 5860 vs. the 7960 fits with the less PCIe slots and smaller power supply.
Both the Dell Precision 7960 and the Dell Precision 5860 have two more x8 PCIe slots in addition to the four x16 PCIe or two x16 PCIe slots, respectively. This gives you the option to upgrade now or in the future.
Due to the positions of the slots,
Option 1 can be upgraded to 6 GPUs by adding four more single slot GPUs
or to 5 GPUs by adding one more double slot and two more single slot GPUs
Option 2 can be upgraded to 3 GPUs by adding one more single slot GPU (the single slot would go into a x16, the dual slots in a x8 and a x16).
Option 3 can be upgraded to 6 GPUs by adding up to two more single or double slot GPUs
Calculate the power needs of your GPUs and the processor. Add 250W for mainboard, RAM and peripherals add CPU TDP, add GPUs TDP and add 50% on top of it for a stable system.
Hello, nujuffa. Thank you for your advice on memory. Currently, the system has only 128GB of memory, but after considering your suggestion, I think itâs necessary to add more. Also, I appreciate your reminder about FP64. For scientific computations where FP64 precision is essential, I believe prioritizing parallel computation on the CPU is the optimal approach.
That one has only four memory channels per Intelâs ARK database and only 8 cores. A tad lower in the base frequency, too. Seems a bit underpowered for feeding four high-end GPUs.
The basic problem is that Intel is struggling with their Xeon lineup in recent years, and AMD provides more generous core counts, PCIe lanes, and base frequencies in their EPYC lineup. This is not a recommendation; I have been buying Xeon / NVIDIA Quadro (now RTX) combos exclusively for more than twenty years but figured I should point out the alternative.
No idea what mass storage needs typical CFD applications have. The AI folks often are well-served by configuring professional-grade 2 TB NVMe PCIe 4 SSDs.
As for configuring large system memory, this is usually where system integrators make you pay though the nose, adding large margins. In the past, I have bought machines with minimal memory configuration from Dell and upgraded with high-quality DRAM from Crucial (a Micron business) which was more affordable. But I have not done so recently, so I donât know whether that approach is still advisable.
Nice information. I am at the other end of the spectrum, getting what I can for minimal cost. I got a Dell Precision tower 7810 from Gov surplus a year ago. Two Xeon E52690 v4 @ 2.60GHZ, total of 24 cores, 128gb ram and an Asus Pro Art Nvidia Gforce RTX 4060 Ti with 16gb of Vram. Three Samsung PRO SSDs. Windows 10 22h2. I know NVMEs would be better, but Iâd have to buy a card for that. The 7810 is about 5+ years old, but I paid only about $1250 for all of the above. I am running the Nvidia ChatRTX with the Llama 13b model. (A middle of the road model, the json must first be converted to plain txt.) Response times are around 2 seconds or less. Very pleased with overall performance. My test dataset is 3080 lines of Json code. Note that for LLMs, the critical hardware factors are the number of CPU cores and CUDA cores, plus generous amounts of ram and Vram. So to get started in this arena you do not have to spend mega bucks, but you do need to be careful of your hardware selection.
IMHO it is all about building balanced systems. If one needs to be price conscious, a good strategy is to choose hardware components from the middle of the performance spectrum. Your system seems reasonably well-balanced, with dual CPU and system memory size a bit on the generous side.
I am a bit confused about âThree Samsung PRO SSDâ vs âNVMEs would be betterâ. The only Samsung PRO SSDs I am aware of are NMVe-based, e.g. Samsung 980 PRO NVMe SSD. What is the full model name of your SSDs?
@rs277 Thanks for the pointer, I guess I never took notice of these before.
The read and write speeds of SATA SSDs are limited at < 600 MB/sec, while NVMe SSDs now reach 10x this throughput. Given that prices for high-capacity NVME SSDs are no longer exorbitant, the choice for mass storage in HPC systems is clear.
For a given speed grade of DRAM, the memory bandwidth of the system memory is basically a linear function of the number of memory channels. Generally, more is better, but whether system memory bandwidth is important for application performance depends entirely on the usage pattern of the application.
If your application is similar to one of the components of the SPEC CPU benchmarks, you could get a pretty good idea how various system parameters influence performance by trawling SPECâs database of published results for that particular benchmark component. Example: If you wanted to get an idea of what makes for a fast compilation platform, look at the results for the gcc component of SPEC CPU, then compare the machine configurations used to deliver those results.
Well good question, before NVMEs became mainstream, the hi end Samsung SSDs were the 'pro" series; think 4 to 6 years ago. Actually the same name âSamsung 980 Proâ. Also other things being equal, it is safer to match up components of the same generation for the sake of hi compatibility and reliability. My research regarding local LLMs showed that the most critical factors for good performance were the number of cores, both CPU, and GPU, after that the most important need was for Vram, then after that regular ram. Now that NVMEs are main stream and cost has come down, I will eventually move over to those. But I am already getting satisfactory performance, so I am in no hurry. With a large amount of RAM, reading from the drives is not that frequent, but even so I have the load spread out between 3 drives. I certainly agree about building a balanced system, keeping in mind what causes bottlenecks with your target application. Note this is a âat homeâ project and âShe Who Must be Obeyedâ would not tolerate my spending $5000 to $7000 on a desktop computer. Thank god for goverment surplus, but yes you must choose wisely.