Hi- I’m trying to run the Deep Convolutional Generative Adversarial Network example from section 2.2 of the 9.2 machine-learning-manual.pdf. The Hello World from section 2.1 works fine, but I get some warnings and failure with DCGAN. The manual doesn’t mention needing a Google authentication bearer token, but I’m not sure if that’s it.
[cht@node001 ~]$ module load tensorflow2-extra-py39-cuda11.2-gcc9
Loading tensorflow2-extra-py39-cuda11.2-gcc9/2.7.0
Loading requirement: openblas/dynamic/0.3.18 hdf5_18/1.8.21 gcc9/9.5.0 python39 cuda11.2/toolkit/11.2.2 cudnn8.1-cuda11.2/8.1.1.33 ml-pythondeps-py39-cuda11.2-gcc9/4.8.1 protobuf3-gcc9/3.9.2
nccl2-cuda11.2-gcc9/2.14.3 tensorflow2-py39-cuda11.2-gcc9/2.7.0 opencv4-py39-cuda11.2-gcc9/4.5.4
[cht@node001 ~]$ module load openmpi4-cuda11.2-ofed51-gcc9
Loading openmpi4-cuda11.2-ofed51-gcc9/4.1.4
Loading requirement: hpcx/mlnx-ofed51/2.7.4 ucx/1.10.1 cm-pmix3/3.1.4 hwloc/1.11.11
[cht@node001 ~]$ cd ${CM_TENSORFLOW2_EXTRA}/tensorflow_examples/models/dcgan/
[cht@node001 dcgan]$ python dcgan.py --epochs 5
/cm/shared/apps/ml-pythondeps-py39-cuda11.2-gcc9/4.8.1/lib/python3.9/site-packages/requests/__init__.py:102: RequestsDependencyWarning: urllib3 (1.26.8) or chardet (5.1.0)/charset_normalizer (2.0.10) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({})/charset_normalizer ({}) doesn't match a supported "
2023-03-22 18:50:53.627768: W tensorflow/core/platform/cloud/google_auth_provider.cc:184] All attempts to get a Google authentication bearer token failed, returning an empty token. Retrieving token from files failed with "NOT_FOUND: Could not locate the credentials file.". Retrieving token from GCE failed with "INTERNAL: Couldn't parse JSON response from OAuth server.".
I0322 18:50:53.700929 23456247932736 dataset_builder.py:400] Generating dataset mnist (/home/cht/tensorflow_datasets/mnist/3.0.1)
Downloading and preparing dataset Unknown size (download: Unknown size, generated: Unknown size, total: Unknown size) to /home/cht/tensorflow_datasets/mnist/3.0.1...
Dl Completed...: 0 url [00:00, ? url/s] I0322 18:50:53.994837 23456247932736 download_manager.py:354] Downloading https://storage.googleapis.com/cvdf-datasets/mnist/t10k-images-idx3-ubyte.gz into /home/cht/tensorflow_datasets/downloads/cvdf-datasets_mnist_t10k-images-idx3-ubytedDnaEPiC58ZczHNOp6ks9L4_JLids_rpvUj38kJNGMc.gz.tmp.55e4a786bd8d40478f88319e535a65fb...
Dl Completed...: 0%| I0322 18:50:53.997988 23456247932736 download_manager.py:354] Downloading https://storage.googleapis.com/cvdf-datasets/mnist/t10k-labels-idx1-ubyte.gz into /home/cht/tensorflow_datasets/downloads/cvdf-datasets_mnist_t10k-labels-idx1-ubyte4Mqf5UL1fRrpd5pIeeAh8c8ZzsY2gbIPBuKwiyfSD_I.gz.tmp.d93d3e181b22493dba85b77ce1fdd027...
Dl Completed...: 0%| I0322 18:50:54.000667 23456247932736 download_manager.py:354] Downloading https://storage.googleapis.com/cvdf-datasets/mnist/train-images-idx3-ubyte.gz into /home/cht/tensorflow_datasets/downloads/cvdf-datasets_mnist_train-images-idx3-ubyteJAsxAi0QnOBEygBw_XW2X7zp-LBZAIqqYSHN8ru4ZO4.gz.tmp.d98e7529af77497dabbe5c21e38fe395...
Dl Completed...: 0%| I0322 18:50:54.004412 23456247932736 download_manager.py:354] Downloading https://storage.googleapis.com/cvdf-datasets/mnist/train-labels-idx1-ubyte.gz into /home/cht/tensorflow_datasets/downloads/cvdf-datasets_mnist_train-labels-idx1-ubytedcDWkl3FO9T-WMEH1f1Xt51eIRmePRIMAk6X147Qw8w.gz.tmp.2ec0b80ac0854337bebc0b03e47659fb...
Extraction completed...: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:27<00:00, 13.56s/ file]
Dl Size...: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:27<00:00, 2.71s/ MiB]
Dl Completed...: 50%|█████████████████████████████████████████████████████████████████████████ | 2/4 [00:27<00:27, 13.56s/ url]
Traceback (most recent call last):
File "/cm/shared/apps/ml-pythondeps-py39-cuda11.2-gcc9/4.8.1/lib/python3.9/site-packages/urllib3/response.py", line 438, in _error_catcher
yield
File "/cm/shared/apps/ml-pythondeps-py39-cuda11.2-gcc9/4.8.1/lib/python3.9/site-packages/urllib3/response.py", line 519, in read
data = self._fp.read(amt) if not fp_closed else b""
File "/cm/local/apps/python39/lib/python3.9/http/client.py", line 463, in read
n = self.readinto(b)
File "/cm/local/apps/python39/lib/python3.9/http/client.py", line 507, in readinto
n = self.fp.readinto(b)
File "/cm/local/apps/python39/lib/python3.9/socket.py", line 704, in readinto
return self._sock.recv_into(b)
File "/cm/local/apps/python39/lib/python3.9/ssl.py", line 1242, in recv_into
return self.read(nbytes, buffer)
File "/cm/local/apps/python39/lib/python3.9/ssl.py", line 1100, in read
return self._sslobj.read(len, buffer)
ssl.SSLError: [SSL: DECRYPTION_FAILED_OR_BAD_RECORD_MAC] decryption failed or bad record mac (_ssl.c:2633)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/cm/shared/apps/tensorflow2-extra-py39-cuda11.2-gcc9/2.7.0/examples/tensorflow_examples/models/dcgan/dcgan.py", line 225, in <module>
app.run(run_main)
File "/cm/shared/apps/ml-pythondeps-py39-cuda11.2-gcc9/4.8.1/lib/python3.9/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/cm/shared/apps/ml-pythondeps-py39-cuda11.2-gcc9/4.8.1/lib/python3.9/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "/cm/shared/apps/tensorflow2-extra-py39-cuda11.2-gcc9/2.7.0/examples/tensorflow_examples/models/dcgan/dcgan.py", line 213, in run_main
main(**kwargs)
File "/cm/shared/apps/tensorflow2-extra-py39-cuda11.2-gcc9/2.7.0/examples/tensorflow_examples/models/dcgan/dcgan.py", line 217, in main
train_dataset = create_dataset(buffer_size, batch_size)
File "/cm/shared/apps/tensorflow2-extra-py39-cuda11.2-gcc9/2.7.0/examples/tensorflow_examples/models/dcgan/dcgan.py", line 47, in create_dataset
train_dataset = tfds.load(
File "/cm/shared/apps/tensorflow2-extra-py39-cuda11.2-gcc9/2.7.0/lib/python3.9/site-packages/tensorflow_datasets/core/load.py", line 318, in load
dbuilder.download_and_prepare(**download_and_prepare_kwargs)
File "/cm/shared/apps/tensorflow2-extra-py39-cuda11.2-gcc9/2.7.0/lib/python3.9/site-packages/tensorflow_datasets/core/dataset_builder.py", line 439, in download_and_prepare
self._download_and_prepare(
File "/cm/shared/apps/tensorflow2-extra-py39-cuda11.2-gcc9/2.7.0/lib/python3.9/site-packages/tensorflow_datasets/core/dataset_builder.py", line 1113, in _download_and_prepare
split_generators = self._split_generators( # pylint: disable=unexpected-keyword-arg
File "/cm/shared/apps/tensorflow2-extra-py39-cuda11.2-gcc9/2.7.0/lib/python3.9/site-packages/tensorflow_datasets/image_classification/mnist.py", line 118, in _split_generators
mnist_files = dl_manager.download_and_extract(
File "/cm/shared/apps/tensorflow2-extra-py39-cuda11.2-gcc9/2.7.0/lib/python3.9/site-packages/tensorflow_datasets/core/download/download_manager.py", line 634, in download_and_extract
return _map_promise(self._download_extract, url_or_urls)
File "/cm/shared/apps/tensorflow2-extra-py39-cuda11.2-gcc9/2.7.0/lib/python3.9/site-packages/tensorflow_datasets/core/download/download_manager.py", line 767, in _map_promise
res = tf.nest.map_structure(lambda p: p.get(), all_promises) # Wait promises
File "/cm/shared/apps/tensorflow2-py39-cuda11.2-gcc9/2.7.0/lib/python3.9/site-packages/tensorflow/python/util/nest.py", line 869, in map_structure
structure[0], [func(*x) for x in entries],
File "/cm/shared/apps/tensorflow2-py39-cuda11.2-gcc9/2.7.0/lib/python3.9/site-packages/tensorflow/python/util/nest.py", line 869, in <listcomp>
structure[0], [func(*x) for x in entries],
File "/cm/shared/apps/tensorflow2-extra-py39-cuda11.2-gcc9/2.7.0/lib/python3.9/site-packages/tensorflow_datasets/core/download/download_manager.py", line 767, in <lambda>
res = tf.nest.map_structure(lambda p: p.get(), all_promises) # Wait promises
File "/cm/shared/apps/ml-pythondeps-py39-cuda11.2-gcc9/4.8.1/lib/python3.9/site-packages/promise/promise.py", line 512, in get
return self._target_settled_value(_raise=True)
File "/cm/shared/apps/ml-pythondeps-py39-cuda11.2-gcc9/4.8.1/lib/python3.9/site-packages/promise/promise.py", line 516, in _target_settled_value
return self._target()._settled_value(_raise)
File "/cm/shared/apps/ml-pythondeps-py39-cuda11.2-gcc9/4.8.1/lib/python3.9/site-packages/promise/promise.py", line 226, in _settled_value
reraise(type(raise_val), raise_val, self._traceback)
File "/cm/shared/apps/ml-pythondeps-py39-cuda11.2-gcc9/4.8.1/lib/python3.9/site-packages/six.py", line 719, in reraise
raise value
File "/cm/shared/apps/ml-pythondeps-py39-cuda11.2-gcc9/4.8.1/lib/python3.9/site-packages/promise/promise.py", line 844, in handle_future_result
resolve(future.result())
File "/cm/local/apps/python39/lib/python3.9/concurrent/futures/_base.py", line 439, in result
return self.__get_result()
File "/cm/local/apps/python39/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
raise self._exception
File "/cm/local/apps/python39/lib/python3.9/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/cm/shared/apps/tensorflow2-extra-py39-cuda11.2-gcc9/2.7.0/lib/python3.9/site-packages/tensorflow_datasets/core/download/downloader.py", line 228, in _sync_download
for block in iter_content:
File "/cm/shared/apps/ml-pythondeps-py39-cuda11.2-gcc9/4.8.1/lib/python3.9/site-packages/requests/models.py", line 760, in generate
for chunk in self.raw.stream(chunk_size, decode_content=True):
File "/cm/shared/apps/ml-pythondeps-py39-cuda11.2-gcc9/4.8.1/lib/python3.9/site-packages/urllib3/response.py", line 576, in stream
data = self.read(amt=amt, decode_content=decode_content)
File "/cm/shared/apps/ml-pythondeps-py39-cuda11.2-gcc9/4.8.1/lib/python3.9/site-packages/urllib3/response.py", line 541, in read
raise IncompleteRead(self._fp_bytes_read, self.length_remaining)
File "/cm/local/apps/python39/lib/python3.9/contextlib.py", line 137, in __exit__
self.gen.throw(typ, value, traceback)
File "/cm/shared/apps/ml-pythondeps-py39-cuda11.2-gcc9/4.8.1/lib/python3.9/site-packages/urllib3/response.py", line 449, in _error_catcher
raise SSLError(e)
urllib3.exceptions.SSLError: [SSL: DECRYPTION_FAILED_OR_BAD_RECORD_MAC] decryption failed or bad record mac (_ssl.c:2633)