How to make Nvidia Digits work with Caffe and Tensorflow installed with Python 3?

Virat_Baahubali · February 12, 2020, 8:44am

After installing digits setting it to root and running ./digits-devserver gave me this error. Numpy is installed but still it is not detecting numpy. Can anyone please guide me through this as it would be really helpful for me to learn this exciting concept? Thanks and cheers. Though I will try my best to figure out this issue and will keep posted.

Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/home/sp/digits/digits/__main__.py", line 70, in <module>
    main()
  File "/home/sp/digits/digits/__main__.py", line 53, in main
    import digits.config
  File "digits/config/__init__.py", line 7, in <module>
    from . import (  # noqa
  File "digits/config/caffe.py", line 13, in <module>
    from digits.utils import parse_version
  File "digits/utils/__init__.py", line 167, in <module>
    from . import constants, image, time_filters, errors, forms, routing, auth  # noqa
  File "digits/utils/image.py", line 14, in <module>
    import numpy as np
ImportError: No module named numpy

I have build and installed caffe and tensorflow with python 3.5. I made sure my installation is successful so I ran some test I hope this tests are considered valid.

<b>python</b>
Python 3.5.2 (default, Oct  8 2019, 13:06:37) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import caffe
>>> import tensorflow as tf; print(tf.__version__)
/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/dtypes.py:523: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/dtypes.py:524: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/dtypes.py:532: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
1.12.3
>>> <b>import tensorflow as tf; print(tf.contrib.eager.num_gpus())</b>
2020-02-06 19:56:45.632580: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-06 19:56:45.632820: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: 
name: GeForce GTX 1050 major: 6 minor: 1 memoryClockRate(GHz): 1.455
pciBusID: 0000:01:00.0
totalMemory: 1.95GiB freeMemory: 1.71GiB
2020-02-06 19:56:45.632844: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2020-02-06 19:56:47.796481: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-02-06 19:56:47.796562: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 
2020-02-06 19:56:47.796587: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N 
2020-02-06 19:56:47.796908: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1462 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050, pci bus id: 0000:01:00.0, compute capability: 6.1)
1

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1094 G /usr/lib/xorg/Xorg 101MiB |
| 0 2105 G compiz 90MiB |
±----------------------------------------------------------------------------+

./deviceQuery Starting…

CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GTX 1050"
CUDA Driver Version / Runtime Version 10.2 / 9.0
CUDA Capability Major/Minor version number: 6.1
Total amount of global memory: 1998 MBytes (2094989312 bytes)
( 5) Multiprocessors, (128) CUDA Cores/MP: 640 CUDA Cores
GPU Max Clock rate: 1455 MHz (1.46 GHz)
Memory Clock rate: 3504 Mhz
Memory Bus Width: 128-bit
L2 Cache Size: 1048576 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.2, CUDA Runtime Version = 9.0, NumDevs = 1
Result = PASS

./mnistCUDNN

cudnnGetVersion() : 7005 , CUDNN_VERSION from cudnn.h : 7005 (7.0.5)
Host compiler version : GCC 5.4.0
There are 1 CUDA capable devices on your machine :
device 0 : sms 5 Capabilities 6.1, SmClock 1455.0 Mhz, MemSize (Mb) 1997, MemClock 3504.0 Mhz, Ecc=0, boardGroupID=0
Using device 0

Testing single precision
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm ...
Fastest algorithm is Algo 1
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.061440 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.072704 time requiring 3464 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.122880 time requiring 57600 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.235520 time requiring 2057744 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.395264 time requiring 203008 memory
Resulting weights from Softmax:
0.0000000 0.9999399 0.0000000 0.0000000 0.0000561 0.0000000 0.0000012 0.0000017 0.0000010 0.0000000
Loading image data/three_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 0.9999288 0.0000000 0.0000711 0.0000000 0.0000000 0.0000000 0.0000000
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 0.9999820 0.0000154 0.0000000 0.0000012 0.0000006

Result of classification: 1 3 5

Test passed!

Testing half precision (math in single precision)
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm ...
Fastest algorithm is Algo 1
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.057344 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.065536 time requiring 3464 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.090112 time requiring 28800 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.215040 time requiring 2057744 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.365568 time requiring 203008 memory
Resulting weights from Softmax:
0.0000001 1.0000000 0.0000001 0.0000000 0.0000563 0.0000001 0.0000012 0.0000017 0.0000010 0.0000001
Loading image data/three_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 1.0000000 0.0000000 0.0000714 0.0000000 0.0000000 0.0000000 0.0000000
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 1.0000000 0.0000154 0.0000000 0.0000012 0.0000006

Result of classification: 1 3 5

Test passed!

locate nccl| grep “libnccl.so” | tail -n1 | sed -r ‘s/^.*.so.//’
2.4.7

python -c ‘import numpy; print(numpy.version)’
1.18.1

AastaLLL · February 17, 2020, 5:45am

Hi,

Based on the error log, DIGITs is looking for the numpy module for python2.7.
Would you mind to install it first to see if helps?

sudo pip install numpy

Thanks.

Virat_Baahubali · February 17, 2020, 6:51pm

I installed numpy for python2.7 now its giving me this error

Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/home/sp/digits/digits/__main__.py", line 70, in <module>
    main()
  File "/home/sp/digits/digits/__main__.py", line 53, in main
    import digits.config
  File "digits/config/__init__.py", line 7, in <module>
    from . import (  # noqa
  File "digits/config/caffe.py", line 13, in <module>
    from digits.utils import parse_version
  File "digits/utils/__init__.py", line 167, in <module>
    from . import constants, image, time_filters, errors, forms, routing, auth  # noqa
  File "digits/utils/image.py", line 16, in <module>
    import scipy.misc
ImportError: No module named scipy.misc

But my question is I have already installed all those modules and dependencies for python3.5 then why Digits is still fetching python2.7? I have build caffe and tensorflow entirely on python3.5 and for Digits also I have downloaded and installed required dependencies on python3.5 still its giving error for python2.7 support instead of python3.5.

Virat_Baahubali · February 22, 2020, 7:33pm

Did you forget to "make pycaffe"?
"/home/sp/caffe" from CAFFE_ROOT does not point to a valid installation of Caffe.
Use the envvar CAFFE_ROOT to indicate a valid installation.
Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/home/sp/digits/digits/__main__.py", line 70, in <module>
    main()
  File "/home/sp/digits/digits/__main__.py", line 53, in main
    import digits.config
  File "digits/config/__init__.py", line 7, in <module>
    from . import (  # noqa
  File "digits/config/caffe.py", line 226, in <module>
    executable, version, flavor = load_from_envvar('CAFFE_ROOT')
  File "digits/config/caffe.py", line 37, in load_from_envvar
    import_pycaffe(python_dir)
  File "digits/config/caffe.py", line 126, in import_pycaffe
    import caffe
  File "/home/sp/caffe/python/caffe/__init__.py", line 1, in <module>
    from .pycaffe import Net, SGDSolver, NesterovSolver, AdaGradSolver, RMSPropSolver, AdaDeltaSolver, AdamSolver, NCCL, Timer
  File "/home/sp/caffe/python/caffe/pycaffe.py", line 13, in <module>
    from ._caffe import Net, SGDSolver, NesterovSolver, AdaGradSolver, \
ImportError: dynamic module does not define init function (init_caffe)

python
Python 3.5.2 (default, Oct  8 2019, 13:06:37) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import caffe
>>> import tensorflow
/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/dtypes.py:523: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/dtypes.py:524: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/dtypes.py:532: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
>>>

python3
Python 3.5.2 (default, Oct  8 2019, 13:06:37) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import caffe
>>> import tensorflow
/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/dtypes.py:523: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/dtypes.py:524: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/dtypes.py:532: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
>>>

I have caffe and tensorflow successfully build and installed on python3.5. But Nvidia Digits is still pointing towards python2.7. I have set path in .bashrc file

export PATH=/usr/local/cuda-9.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64\${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export PROTOBUF_ROOT=~/protobuf
export CAFFE_ROOT=~/caffe
export PYTHONPATH=$home/username/caffe/python:$PYTHONPATH

How can I solve this issue and make Nvidia Digits set to point python3.5? I am really interested in learning this concept so any suggestion or advice would definelty speed up my learning. I will keep posting if something comes up. Thanks

AastaLLL · February 25, 2020, 8:56am

Hi,

Sorry for the late update.
AFAIK, DIGITs is not updated for a while and may not have python-3 support.

Here is discussion about the python-3 support and someone have shared the changes for it.
It’s recommended to give it a try: https://github.com/NVIDIA/DIGITS/issues/511#issuecomment-276144313

Thanks.