Dataset_convert tool looks like it is running properly but the TFrecords aren’t populated in the output folders

Hello, I am fairly new to machine learning and nvidia tao. I am attempting to train the emotionnet model on a new data set. I’ve successfully created json files for this dataset and am now attempting to convert them to tfrecords. However, although the tfrecord files are being created, they are empty. I have looked at other places on the internet for help, but no one else seems to have this specific problem, and I don’t think it’s an issue with the ~/.tao_mounts.json.

Please provide the following information when requesting support.

• Hardware (T4/V100/Xavier/Nano/etc) - Quadro RTX 6000
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc) - emotionnet
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here) - nvidia/tao/tao-toolkit-tf: v3.22.05-tf1.15.5-py3:
• Training spec file(If have, please share here) - emotionnet_tlt_pretrain.yaml
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)
tao emotionnet dataset_convert -c /workspace/tao-experiments/emotionnet/dataset_specs/dataio_config_ckplus.json

Log:
2022-06-29 11:53:38,553 [INFO] root: Registry: [‘nvcr.io’]
2022-06-29 11:53:38,694 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.22.05-tf1.15.5-py3
2022-06-29 16:53:39.851597: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
/usr/local/lib/python3.6/dist-packages/requests/init.py:91: RequestsDependencyWarning: urllib3 (1.26.5) or chardet (3.0.4) doesn’t match a supported version!
RequestsDependencyWarning)
Matplotlib created a temporary config/cache directory at /tmp/matplotlib-6bf8vw2r because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
Using TensorFlow backend.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/checkpoint_saver_hook.py:25: The name tf.train.CheckpointSaverHook is deprecated. Please use tf.estimator.CheckpointSaverHook instead.

2022-06-29 16:53:42,092 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/checkpoint_saver_hook.py:25: The name tf.train.CheckpointSaverHook is deprecated. Please use tf.estimator.CheckpointSaverHook instead.

WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
/usr/local/lib/python3.6/dist-packages/requests/init.py:91: RequestsDependencyWarning: urllib3 (1.26.5) or chardet (3.0.4) doesn’t match a supported version!
RequestsDependencyWarning)
Using TensorFlow backend.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/checkpoint_saver_hook.py:25: The name tf.train.CheckpointSaverHook is deprecated. Please use tf.estimator.CheckpointSaverHook instead.

2022-06-29 16:53:44,372 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/checkpoint_saver_hook.py:25: The name tf.train.CheckpointSaverHook is deprecated. Please use tf.estimator.CheckpointSaverHook instead.

WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/emotionnet/dataio/data_converter.py:98: The name tf.FixedLenFeature is deprecated. Please use tf.io.FixedLenFeature instead.

2022-06-29 16:53:44,384 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/emotionnet/dataio/data_converter.py:98: The name tf.FixedLenFeature is deprecated. Please use tf.io.FixedLenFeature instead.

WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/emotionnet/dataio/data_converter.py:101: The name tf.VarLenFeature is deprecated. Please use tf.io.VarLenFeature instead.

2022-06-29 16:53:44,384 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/emotionnet/dataio/data_converter.py:101: The name tf.VarLenFeature is deprecated. Please use tf.io.VarLenFeature instead.

2022-06-29 16:53:44,385 [INFO] main: Generate Tfrecords for data with required json labels
/workspace/tao-experiments/emotionnet/postData/ckplus2/Ground_Truth_DataFactory/TfRecords
/workspace/tao-experiments/emotionnet/postData/ckplus2/Ground_Truth_DataFactory/GT
2022-06-29 16:53:44,385 [INFO] main: Start to parse data…
2022-06-29 16:53:44,385 [INFO] main: Run full conversion…
/workspace/tao-experiments/emotionnet/postData/ckplus2/GT_user_json
2022-06-29 16:53:44,385 [INFO] main: Convert json file…
2022-06-29 16:53:45,692 [INFO] main: Start to write user tfrecord…
2022-06-29 16:53:45,692 [INFO] main: Start to split data…
/workspace/tao-experiments/emotionnet/postData/ckplus2/Ground_Truth_DataFactory/TfRecords_combined
2022-06-29 16:53:45,692 [INFO] main: Test:
2022-06-29 16:53:45,692 [INFO] main: Validation
2022-06-29 16:53:45,692 [INFO] main: Train
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/emotionnet/dataio/data_converter.py:273: The name tf.python_io.TFRecordWriter is deprecated. Please use tf.io.TFRecordWriter instead.

2022-06-29 16:53:45,692 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/emotionnet/dataio/data_converter.py:273: The name tf.python_io.TFRecordWriter is deprecated. Please use tf.io.TFRecordWriter instead.

/workspace/tao-experiments/emotionnet/postData/ckplus2/Ground_Truth_DataFactory/GT_combined
2022-06-29 11:53:46,488 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

Did you ever run with the default Jupyter notebook and find teh tfrecords?

Yes, I should have mentioned that the training worked as expected (including the tfrecords) with the default notebook and the ckplus dataset. Now that I’ve brought in a new dataset, I’ve had to format it in a way that is similar to the ckplus dataset, but I was successfully able to do that and convert the data to json format.

I should also note that yesterday, I started receiving the following warning whenever I run a tao command:
/home/hipe/Facial_Recognition/lib/python3.7/site-packages/tlt/init.py:20: DeprecationWarning:
The nvidia-tlt package will be deprecated soon. Going forward please migrate to using the nvidia-tao package.

I’m aware that changes are actively happening as nvidia is transitioning away from tlt. The default notebook still uses the env KEY “nvidia-tlt” and seems to still rely on elements of the nvidia-tlt pacakge. But given that the original notebook still doesn’t have any issues, I’m not sure if the deprecation of tlt could explain why the tfrecords aren’t being populated.

The nvidia-tao package should not be the reason.

To debug, please open a terminal and run dataset_convert inside the docker.
$ tao emotionnet run /bin/bash

# emotionnet dataset_convert xxx

I received a whole list of warnings and then the following output:

2022-06-30 15:19:36,486 [INFO] main: Generate Tfrecords for data with required json labels
/workspace/tao-experiments/emotionnet/postData/ckplus/Ground_Truth_DataFactory/TfRecords
/workspace/tao-experiments/emotionnet/postData/ckplus/Ground_Truth_DataFactory/GT
2022-06-30 15:19:36,486 [INFO] main: Start to parse data…
2022-06-30 15:19:36,486 [INFO] main: Run full conversion…
/workspace/tao-experiments/emotionnet/postData/ckplus/GT_user_json
2022-06-30 15:19:36,486 [INFO] main: Convert json file…
2022-06-30 15:19:38,007 [INFO] main: Start to write user tfrecord…
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/emotionnet/dataio/data_converter.py:259: The name tf.python_io.TFRecordWriter is deprecated. Please use tf.io.TFRecordWriter instead.

2022-06-30 15:19:38,008 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/emotionnet/dataio/data_converter.py:259: The name tf.python_io.TFRecordWriter is deprecated. Please use tf.io.TFRecordWriter instead.

2022-06-30 15:19:38,146 [INFO] main: Start to split data…
/workspace/tao-experiments/emotionnet/postData/ckplus/Ground_Truth_DataFactory/TfRecords_combined
2022-06-30 15:19:38,146 [INFO] main: Test: [‘S051’, ‘S108’, ‘S158’, ‘S149’, ‘S137’, ‘S032’, ‘S066’, ‘S046’, ‘S097’, ‘S504’, ‘S091’]
2022-06-30 15:19:38,146 [INFO] main: Validation [‘S094’, ‘S122’, ‘S082’, ‘S147’, ‘S060’, ‘S042’, ‘S096’, ‘S014’, ‘S083’, ‘S089’, ‘S113’]
2022-06-30 15:19:38,146 [INFO] main: Train [‘S005’, ‘S129’, ‘S157’, ‘S068’, ‘S063’, ‘S111’, ‘S044’, ‘S074’, ‘S139’, ‘S011’, ‘S127’, ‘S155’, ‘S105’, ‘S010’, ‘S154’, ‘S061’, ‘S088’, ‘S125’, ‘S101’, ‘S062’, ‘S090’, ‘S160’, ‘S106’, ‘S131’, ‘S078’, ‘S895’, ‘S112’, ‘S092’, ‘S071’, ‘S126’, ‘S087’, ‘S148’, ‘S057’, ‘S128’, ‘S080’, ‘S506’, ‘S052’, ‘S029’, ‘S081’, ‘S055’, ‘S095’, ‘S079’, ‘S502’, ‘S116’, ‘S099’, ‘S076’, ‘S098’, ‘S053’, ‘S093’, ‘S136’, ‘S065’, ‘S085’, ‘S059’, ‘S156’, ‘S100’, ‘S064’, ‘S501’, ‘S077’, ‘S505’, ‘S037’, ‘S110’, ‘S069’, ‘S026’, ‘S124’, ‘S028’, ‘S058’, ‘S067’, ‘S050’, ‘S084’, ‘S138’, ‘S070’, ‘S073’, ‘S132’, ‘S135’, ‘S151’, ‘S119’, ‘S034’, ‘S133’, ‘S086’, ‘S109’, ‘S107’, ‘S503’, ‘S114’, ‘S056’, ‘S134’, ‘S045’, ‘S035’, ‘S072’, ‘S115’, ‘S022’, ‘S075’, ‘S102’, ‘S130’, ‘S054’, ‘S117’, ‘S999’]
/workspace/tao-experiments/emotionnet/postData/ckplus/Ground_Truth_DataFactory/GT_combined

It does appear to split the data, but this is unfortunately the old data from the ckplus dataset. It appears to still be configured for the old data and not the new data I’m giving to it.

Could you share the json file?

I have the data categorized into 7 emotions, and for each emotion, I have a separate JSON file corresponding to each image. I’ve attached an example file (they all have the same format). I can share any others if needed.

Afraid_Data_002_afraid.json (4.6 KB)

Did you set up correctly for dataio_config_ckplus.json?

Yes, that file should be configured properly. Here it is (note: I changed the data folder name back to ckplus to match the original dataset folder name, which is reflected in this file):
dataio_config_ckplus.json (451 Bytes)

As far as I can tell, all my local data paths are set up properly, as I’m not getting any errors in that regard, and also the tfrecord files are being created (albeit empty). Here’s a screenshot of what my Ground_Truth_DataFactory folder looks like, if that would be any help:

The “GT” folder contains 7 text files (one for each emotion), where each text file contains all the landmarks for each image within that emotion. Example:
Afraid.txt (57.3 KB)

The GT_combined folder contains “test.txt”, “train.txt”, and “validate.txt”, but all those are empty. I believe all this is supposed to be doing is splitting up the GT data, but clearly that part is not working. So it’s occurring to me that maybe the issue isn’t primarily with the tfrecords, but with the fact that it can’t parse the data in the GT text files. In the original dataset, there were far more text files with less data in them, whereas in mine, there are only 7, with much more data in each one, but the format should be the same. Can you see any possible reason why the GT_combined text files aren’t being populated with the data in GT?

I just realized that the fact that there are only 7 text files to parse may be the issue. In the original dataset, the way things were categorized resulted in way more text files. I assume the test/train/validate split is set up such that a certain percent goes into each category, but with only 7 text files, there may not be enough to do, say, a 10% test split. Could that be why the folders are empty?

In the default notebook, there is a file named “ckplus_convert.py”.
It is inside the notebook, you can have a look at it to check what is happening.
By default, it is used to convert existing Landmarks and Emotion labels from CK+ dataset to the required json label format.

So, please make sure your custom dataset has been converted as expected.
!python3 ckplus_convert.py --root_path $LOCAL_EXPERIMENT_DIR --dataset_folder_name ckplus --container_root_path $USER_EXPERIMENT_DIR

Thanks for getting back to me! I’ve already created a duplicate version of that file called “ckplus_test_convert.py” and have made changes in that file to ensure that the json conversion process is run properly for my data. Here is that file:
ckplus_test_convert.py (12.0 KB)

I ran the command and obtained the json files. I actually went back and resorted the data to identically match the original ckplus dataset (i.e. under images, there’s a folder, then a subfolder that contains a single image file). Here’s an example json file:
002_disgust.json (4.6 KB)

I did a 1:1 comparison with those from the original dataset and they look identical in form. The way I have things configured now, I have approximately 400 json files - one for each image. When I run the dataset_convert command on them, the GT folder is populated with ~400 txt files, which appear to be formatted properly, but still the GT_combined and TfRecords_combined folders contain train/test/validate files that are unpopulated. I’ve checked everything (the data paths, the python conversion file, the config file, etc.) repeatedly and everything seems to check out. The json files, again, seem to look exactly how they should.

A couple considerations:

  1. Is there anything inside my dataio_config_ckplus.json file that appears incorrect? I’ve tried changing things in there but nothing has worked as far as I can tell. The only difference between it and the original is the emotion map is slightly different.
  2. Is there perhaps anything I need to change in the dataset_convert.py file to account for my data? I attempted to access it through the root folder in the nvidia docker, but am denied access to the root folder. Is there any way I can access that py file, or is that unnecessary?
  3. Does it matter that my image extension is JPG, not png?

I check your original log in the very beginning of this topic. Is the log correct? I cannot see anything after " main : Test:". With default ck+ dataset, there should be something as below.

main: Test: [‘S051’, ‘S108’, ‘S158’, ‘S149’, ‘S137’, ‘S032’, ‘S066’, ‘S046’, ‘S097’, ‘S504’, ‘S091’]

If possible, please share some dataset for checking and reproducing.

Here is a zip file containing an example json data file for each emotion I’m working with. There are many more for each emotion.
ckplus_example.zip (7.5 KB)

Last time I ran tao emotionnet /bin/bash, I had some issues with the docker connection, and so it was printing the original data split (like the one you attached). I’ve made a few changes since then, and here’s what I get now when I run in the docker. This is the same issue I have as when I run in jupyter notebooks, and it’s the main issue I’m having. The log after “main” is supposed to be populated with my own data.

Dataset_convert_output_log (5.1 KB)

Could you share some original images, landmarks, labels?

Sure thing! Here’s a zip file containing my images, landmarks, and emotion labels. They’re all set up to match the original file structure from the ckplus dataset.

emotionnet_dataset.zip (71.0 MB)

The original dataset I was using was the cohn-kanade-images ckplus dataset.

Thanks. I will check if I can reproduce.

1 Like

Please try below.

  • Modify the tree and filename. Just strictly follow the “tree” and the filename inside the ckplus folder.
  • Modify ckplus_convert.py
    image_file_name = file_prefix + ‘.JPG’

I’ve made changes to the tree and filename without success. For now, I’ve decided to take a slightly different direction with my research, but I’ll let you know if I have any future questions regarding tao. Thanks so much for you help!

OK, please let me know if you have further question. Thanks.