Does anyone have instructions for how to use Tensorflow on the GPU on a DGX Spark? Is this possible yet?
NVIDIA optimized Tensorflow has reached EOL: End of Life Notices — NVIDIA AI Enterprise Notices
Try this by creating a virtual environment. It worked for me.
pip install nvidia-tensorflow[horovod] --extra-index-url=https://pypi.ngc.nvidia.com/
Yes: I had to use this install method as well, but once done, TensorFlow worked fine:
pip install nvidia-tensorflow[horovod] --extra-index-url=https://pypi.ngc.nvidia.com/
Thanks to Santosh!
Hi, I am facing tensorflow issue too.
I running a container base on nvcr.io/nvidia/pytorch:25.10-py3, then installed tensorflow by pip install nvidia-tensorflow[horovod] --extra-index-url=https://pypi.ngc.nvidia.com/, package installed but error as below:
root@7ec1533a5c9f:/workspace# python3
Python 3.12.3 (main, Aug 14 2025, 17:47:21) [GCC 13.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2025-11-05 14:25:10.339985: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-11-05 14:25:10.347213: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8473] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-11-05 14:25:10.350579: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1471] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-11-05 14:25:10.776737: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT: INTERNAL: Cannot dlopen all TensorRT libraries: FAILED_PRECONDITION: Could not load dynamic library 'libnvinfer.so.10.8.0'; dlerror: libnvinfer.so.10.8.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/compat/lib.real:/usr/local/lib/python3.12/dist-packages/torch/lib:/usr/local/lib/python3.12/dist-packages/torch_tensorrt/lib:/usr/local/cuda/compat/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
>>> print(tf.config.list_physical_devices('GPU'))
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1762352719.666273 260 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1762352719.707974 260 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1762352719.710736 260 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
>>>
I also tried to run a program using tensorflow, more error:
2025-11-05 14:22:57.495784: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:388] MLIR V1 optimization pass is not enabled
2025-11-05 14:22:57.735448: I external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:531] Loaded cuDNN version 91400
W0000 00:00:1762352577.804734 240 gpu_timer.cc:114] Skipping the delay kernel, measurement accuracy will be reduced
W0000 00:00:1762352577.811749 240 gpu_timer.cc:114] Skipping the delay kernel, measurement accuracy will be reduced
Program finally complete and output correct result. However, it is far slower than tensorflow cpu version.
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.