Tensorflow python module takes too much time to give result on a first start

Hi, i have TF 2.4.1 python3.8 module, compiled from sources. The trouble is that, when i do import tensorflow for the first time , after i open #python3 cli, and give it a simple things to calc, it takes >10m to receive the results. But after it, it does calculations pretty fast. I’ll show you on a screenshots:

As you can see the time after i post “a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])” is 14:18 - 14:47
It’s quite a bit time.

The second pass of the same code with a different values takes no time:

And if i do it on a host system, it aslso does good:


Do you compile the package with Nano GPU architecture (sm_53)?
If not, some GPU files need to recompile with the correct architecture when initialization.


I compiled it with bazel and --aarch64 only. What’s the correct way ?

bazel --host_jvm_args=-Xmx32768m build --config=opt --config=noaws --config=nogcp --config=nohdfs --config=nonccl --config=monolithic --config=cuda --config=v2 --local_cpu_resources=32 -j 32 --define=tflite_pip_with_flex=true --copt=-ftree-vectorize --copt=-funsafe-math-optimizations --copt=-ftree-loop-vectorize --copt=-fomit-frame-pointer --subcommands //tensorflow/tools/pip_package:build_pip_package


Please check the below repository for an example to compile TensorFlow with a specified GPU architecture:


This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.