That’s a pretty good indication that either
(1) Your application is not actually using GPU acceleration
(2) Your application uses GPU acceleration, but is completely bottlenecked by host system performance.
If you look at the list of NVIDIA partners that I pointed to, you can filter by those who are specializing in deep-learning systems. You could also check out forums dedicated to deep learning frameworks to see what kind of systems people there recommend from first-hand experience.
My general host-system recommendation for applications that are well optimized for the GPU is to go relatively easy on the CPU core count but aggressive on CPU single-thread performance (basically: high base frequency), because you want to avoid getting bottlenecked on the non-parallel portions of your workload (Amdahl’s law). Four CPU cores per GPU is usually sufficient. System memory size should ideally be 4x the total GPU memory, and make it as fast as you can, e.g. four-channel DDR4 (e.g. NVIDIA’s GDX-1 has 128 GB of GPU memory and 512 GB of system memory). NVMe SSDs often make sense, but could be expensive. For the PSU(s), look for 80 PLUS Titanium, or at least 80 PLUS Platinum, compliant units.