Hello Nvidia team and community,
I have new system with RTX3070 and going to install Ubuntu [can suggest for version] and before installing other drivers I want to know that will RTX3070 supports CUDA10.0? and which reference I should follow to install CUDA10.0 and Nvidia display drivers for this GPU.
Notebook or desktop system?
CPU: intel core i7 (10th gen)
As a rule of thumb, the cuda version being current on release of a new gpu architecture should be used, in your case cuda 11. Up to now, no compatibility issues of Ampere gpus with cuda 10 are known, so you should be safe.
For cuda 10, packages are provided for Ubuntu 18.04, in case of Ubuntu 20.04, you could use the runfile installer.
In both cases, it is important to not install the bundled driver but instead use the one that’s provide by Ubuntu repos. In case of package based install, you’re required to run
sudo apt install cuda-toolkit-10-0
apt install cuda
in case of the runfile, just skip the driver install when the installer asks for it.
Hello Sir, Thanks for your response.
I had installed CUDA10.0, lastest NVIDIA display drivers, CUDNN, tf1.15.0.
We start running our training script, in nvidia-smi prompt its showing around 90% GPU memory usages but 0% GPU volatile ECC util.
Does that mean my training is not yet started on GPU? Its very slow process. And the same script is running very fast with 100% volatile with GTX 1080ti in another machine so I don’t think that there is issue in script.
Thanks in advance for your help!
There’s no such thing as “volatile ecc util.”, I suppose you confused it with volatile ECC errors. Geforce type cards don’t have ECC memory, so that value should always be N/A.
Hello Sir, It was my error. It was Volatile GPU Util which is showing 0%.
Hello thanks for your reply. I am able to use RTX 3070 with conda env , tf1.15, CUDA 10.0.
But now my concern is that same model script running faster with GPU 1080ti compare to RTX 3070.
Performance analysis for 1 hr training on both machines: on GTX 1080ti = 25k steps, on RTX 3070 = 12k steps.
I am giving same batch size = 16 to both machines and same RAM size. Only difference is different CPUs but I am sure that machine with RTX 3070 has higher CPU compare to GTX 1080ti.
Why RTX 3070 performing slower than GTX 1080ti?
Are you running the machines headless, i.e. without an Xserver being started on the nvidia gpu? If that’s the case, please check if the persistence daemon (nvidia-persistenced) is started.
Hey, this is logs of “** sudo systemctl status nvidia-persistenced
admin-u@atig0:~$ sudo systemctl status nvidia-persistenced
● nvidia-persistenced.service - NVIDIA Persistence Daemon
Loaded: loaded (/lib/systemd/system/nvidia-persistenced.service; static; vendor preset: enabled)
Active: active (running) since Thu 2021-02-25 08:58:47 IST; 6h ago
Main PID: 1005 (nvidia-persiste)
Tasks: 1 (limit: 4915)
└─1005 /usr/bin/nvidia-persistenced --user nvidia-persistenced --no-persistence-mode --verbose
Feb 25 08:58:47 atig0 systemd: Starting NVIDIA Persistence Daemon…
Feb 25 08:58:47 atig0 nvidia-persistenced: Verbose syslog connection opened
Feb 25 08:58:47 atig0 nvidia-persistenced: Now running with user ID 122 and group ID 127
Feb 25 08:58:47 atig0 nvidia-persistenced: Started (1005)
Feb 25 08:58:47 atig0 nvidia-persistenced: device 0000:01:00.0 - registered
Feb 25 08:58:47 atig0 nvidia-persistenced: Local RPC services initialized
Feb 25 08:58:47 atig0 systemd: Started NVIDIA Persistence Daemon.
#--------------and output of "sudo gedit /lib/systemd/system/nvidia-persistenced.service
Description=NVIDIA Persistence Daemon
ExecStart=/usr/bin/nvidia-persistenced --user nvidia-persistenced --no-persistence-mode --verbose
ExecStopPost=/bin/rm -rf /var/run/nvidia-persistenced
What’s the gpu utilization during training?
The reason might be the smaller memory size of the 3070. Furthermore, I guess you should change to fp16 to make use of the tensorcores.
Can you elaborate more on what is fp16?
Since you’re working with tensorflow, you should now that and about data types and precision in general. fp16=16-bit floating point. Though google just told me that ampere introduced a new TF32 data type for use on tensor cores.