They work fine it seems but they only use one CPU core at all time instead of the 4 available.
If I run something like this for example, the job stops at 100% usage.
import torch
a = torch.rand(100, 1000, 1000)
b = torch.rand(100, 1000, 1000)
while True:
c = torch.bmm(a, b)
Same goes for a numpy computation that would spread accross all cores otherwise.
Tensorflow, however, uses all available ressources.
Any idea why?
Do I have to install some special library like openBLAS or MKL for pytorch and numpy to use all available ressources? Or is this a problem with the wheel which was distributed?
But torch and numpy are calling C extensions which are highly parallelized, and use multiple cores. I’m able to get 1400% CPU usage with the same code snippet on a 32 core machine (x86_64 machine, pytorch installed with standard pip).
So the problem is with the build, not with Python.
I can confirm your snippet only uses one core on my Xavier with Nvidia’s wheel (same as for Nano), and the same snippet uses all my cores on x86. I don’t know enough about the torch package specifically to say what’s wrong or if this is normal behavior where a GPU is present (my x86 machine has no Cuda or OpenCL set up).
It seems possible/likely that it is related to the BLAS backend or lack thereof (see this recent post).
If you could re-build PyTorch after you have OpenBLAS or the desired multithreaded backend, and confirm if it fixes your issue, that would be helpful for when I go to build the wheels for the PyTorch v1.4.0 release. What is TBD would be if this requires all users of the wheel to install OpenBLAS too.
That’s what I thought, I just wanted to be sure before doing it.
I’ll try and get into it tomorrow. I’ve never installed OpenBLAS on an arm64 CPU before, anything I should be aware of to successfully built it?
Could you also give the commands you used to build the pytorch wheel? That would be really helpful
So I installed OpenBLAS using the following commands.
# First Install gcfortran
sudo apt install gfortran
# DL and compile OpenBLAS
git clone https://github.com/xianyi/OpenBLAS.git
cd OpenBLAS
make FC=gfortran
sudo make install
import torch
A = torch.randn(2, 3, 1, 4, 4)
B = torch.randn(2, 3, 1, 4, 6)
X, LU = torch.solve(B, A)
which raises :
Traceback (most recent call last):
File "pt_solve.py", line 4, in <module>
X, LU = torch.solve(B, A)
RuntimeError: solve: LAPACK library not found in compilation
You might want to try installing ‘sudo apt-get install libopenblas-dev’ (from the Ubuntu repo)
Perhaps when you installed from source, it put it under /usr/local or somewhere where PyTorch didn’t automatically find it.
From reading the PyTorch forums, it should automatically detect OpenBLAS installation during setup.py. Near the beginning of running setup.py, you should seen something along the lines of ‘OpenBLAS…detected’ when it is configuring the build. If it doesn’t find it then, it’s probably not worth proceeding in the build until it’s able to detect it.
I’ve heard PyTorch v1.4.0 should be released rather soon (sometime this month I believe), at which time I can also take a crack at it.
By the way, numpy is one of the most library in python, we should be able to use it with a pultithreading backend.
Could you also help to do that please? Either with build instructions or with a wheel. I tried 6 different ways and none of them uses more than one core for computing.
This is pretty frustrating.
Yes sorry I saw that and forgot it was there, my bad.
Finally, pytorch build was successful using the bdist_wheel, as indicated in several threads. Thanks again for the help for that.
Now it can find LAPACK and use solve and runs on multi-core when possible.
I didn’t try to rebuild numpy from source yet, I’m probably going to do it now. I’ll let you know.
This uses 4 cores so I’m pretty confident reinstalling form source will work.