Calculate mAP of tlt using custom dataset

H19012 · August 3, 2021, 7:01am

Is there a feature in TLT to calculate mAP using a custom dataset (not trainval dataset using during training). Assume training is finished and the tfrecord files are deleted.

Morganh · August 3, 2021, 7:59am

You can use “tlt evaluate”.
For example, if you already train a detectnet_v2 tlt model, then run
$ tlt detectnet_v2 evaluate xxx

H19012 · August 4, 2021, 1:03am

How to edit the spec file to use the new test data? Also how to make tfrecords for the test data?

Morganh · August 4, 2021, 1:30am

If you plan to train TLT detectnet_v2 network, please refer to tlt user guide NVIDIA TAO Documentation
or you can download the jupyter notebook. You can refer to the spec files and the steps in the notebook.
NVIDIA TAO Documentation

H19012 · August 4, 2021, 1:38am

Training is done, I want to just check the mAP with a different set of data than the trainval one.

kitti_config { root_directory_path: "/path/to/trainval_root" image_dir_name: "images" label_dir_name: "labels" image_extension: ".jpg" partition_mode: "random" num_partitions: 2 val_split: 14 num_shards: 10 }

With kitti_config, the dataset is randomly divided into two partitions, training and validation. This is set by the partition_mode and num_partitions keys values. The val_split option specifies the percentage of data used for validation.

Despite the val_split value, you can evaluate the entire test set by using validation_data_source in the spec file, which is discussed in the next section.

dataset_config { data_sources: { tfrecords_path: “/path/to/trainval_tfrecords/*” image_directory_path: “/path/to/trainval_root” } image_extension: “jpg” target_class_mapping { key: “person” value: “person” } target_class_mapping { key: “face” value: “face” } target_class_mapping { key: “bag” value: “bag” }
validation_fold: 0
# For evaluation on test set # validation_data_source: { # tfrecords_path: "/path/to/test_tfrecords/*" # image_directory_path: "/path/to/test_root" # }

To specify validation data, use validation_fold . For test data, use validation_data_source

If I add the new dataset to “validation_data_source” and use tlt-dataset-convert to make new tfrecords, 1.will tlt-evaluate only use this dataset? 2. what should I put as the value for “validation_fold” (does 0 mean false and 1 mean true)?

Morganh · August 4, 2021, 1:50am

If you delete validation_fold: 0 and set “validation_data_source” instead, yes, the evaluation will only evaluate this data source.
No, it does not mean true or false. The “validation_fold : 0" means the fold 0 of the TFrecords you have generated. You can run “ls” against your tfrecords files. Then you can find some files have
xxx-000-xxx. For fold 1, they have xxx-001-xxx.
More info, please see NVIDIA TAO Documentation

H19012 · August 4, 2021, 1:56am

Thanks! Also, if I have 5 folds and set “validation_fold : 1", what happens?

only xxx-001-xxx used
xxx-000-xxx and xxx-001-xxx AKA the first 2 folds are used
Any random 2 folds are used

*Assuming “validation_data_source” is disabled.

Morganh · August 4, 2021, 1:58am

Can you “ll” your tfrecords files and paste the result here?

H19012 · August 4, 2021, 2:08am

For train and retrain spec.

total 3696
drwxrwxrwx 2 root root 4096 8月 3 18:48 ./
drwxrwxrwx 3 root root 4096 4月 1 11:56 …/
-rw-r–r-- 1 root root 74057 8月 3 18:48 -fold-000-of-002-shard-00000-of-00010
-rw-r–r-- 1 root root 75545 8月 3 18:48 -fold-000-of-002-shard-00001-of-00010
-rw-r–r-- 1 root root 74862 8月 3 18:48 -fold-000-of-002-shard-00002-of-00010
-rw-r–r-- 1 root root 74364 8月 3 18:48 -fold-000-of-002-shard-00003-of-00010
-rw-r–r-- 1 root root 73630 8月 3 18:48 -fold-000-of-002-shard-00004-of-00010
-rw-r–r-- 1 root root 72979 8月 3 18:48 -fold-000-of-002-shard-00005-of-00010
-rw-r–r-- 1 root root 72987 8月 3 18:48 -fold-000-of-002-shard-00006-of-00010
-rw-r–r-- 1 root root 74266 8月 3 18:48 -fold-000-of-002-shard-00007-of-00010
-rw-r–r-- 1 root root 77183 8月 3 18:48 -fold-000-of-002-shard-00008-of-00010
-rw-r–r-- 1 root root 78545 8月 3 18:48 -fold-000-of-002-shard-00009-of-00010
-rw-r–r-- 1 root root 301538 8月 3 18:48 -fold-001-of-002-shard-00000-of-00010
-rw-r–r-- 1 root root 293106 8月 3 18:48 -fold-001-of-002-shard-00001-of-00010
-rw-r–r-- 1 root root 297972 8月 3 18:48 -fold-001-of-002-shard-00002-of-00010
-rw-r–r-- 1 root root 302887 8月 3 18:48 -fold-001-of-002-shard-00003-of-00010
-rw-r–r-- 1 root root 300688 8月 3 18:48 -fold-001-of-002-shard-00004-of-00010
-rw-r–r-- 1 root root 305055 8月 3 18:48 -fold-001-of-002-shard-00005-of-00010
-rw-r–r-- 1 root root 297548 8月 3 18:48 -fold-001-of-002-shard-00006-of-00010
-rw-r–r-- 1 root root 297686 8月 3 18:48 -fold-001-of-002-shard-00007-of-00010
-rw-r–r-- 1 root root 294529 8月 3 18:48 -fold-001-of-002-shard-00008-of-00010
-rw-r–r-- 1 root root 300168 8月 3 18:48 -fold-001-of-002-shard-00009-of-00010

Morganh · August 4, 2021, 2:13am

If you set “validation_fold : 0" in the spec, the training dataset are “-fold-001-of-002-xxx” files.
The evaluation dataset are “-fold-000-of-002-xxx” files.

If you set “validation_fold : 1" in the spec, the training dataset are “-fold-001-of-002-xxx” files.
The evaluation dataset are still “-fold-001-of-002-xxx” files.

H19012 · August 4, 2021, 2:26am

Thanks!
I was going to ask how to make tfrecords for “validation_data_source” but I found my answer in detectnet_v2_tfrecords_kitti_val.txt from the examples you provided.

Morganh · August 4, 2021, 2:27am

It is the same. You can use the same way to generate tfrecords.
Then set “validation_data_source” to the path.

H19012 · August 4, 2021, 4:13am

After making new test tfrecords and enabling “validation_data_source”

I get this error-

Using TensorFlow backend.
2021-08-04 04:11:49.631529: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2021-08-04 04:11:51,761 [INFO] iva.detectnet_v2.spec_handler.spec_loader: Merging specification from /workspace/detectnet_v2/new_specs_resnet10/detectnet_v2_retrain_resnet10_kitti.txt
2021-08-04 04:11:52.436783: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2021-08-04 04:11:52.624256: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-04 04:11:52.624619: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:01:00.0
2021-08-04 04:11:52.624699: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-04 04:11:52.625025: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 1 with properties: 
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:02:00.0
2021-08-04 04:11:52.625049: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2021-08-04 04:11:52.625094: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2021-08-04 04:11:52.625776: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2021-08-04 04:11:52.625984: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2021-08-04 04:11:52.626821: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2021-08-04 04:11:52.627438: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2021-08-04 04:11:52.627479: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-08-04 04:11:52.627546: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-04 04:11:52.627927: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-04 04:11:52.628291: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-04 04:11:52.628648: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-04 04:11:52.628969: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0, 1
2021-08-04 04:11:52.628994: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2021-08-04 04:11:53.366575: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-08-04 04:11:53.366630: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0 1 
2021-08-04 04:11:53.366637: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N Y 
2021-08-04 04:11:53.366644: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 1:   Y N 
2021-08-04 04:11:53.366846: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-04 04:11:53.367261: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-04 04:11:53.367655: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-04 04:11:53.368039: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-04 04:11:53.368399: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9733 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
2021-08-04 04:11:53.368643: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-04 04:11:53.369020: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-04 04:11:53.369374: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 9833 MB memory) -> physical GPU (device: 1, name: GeForce RTX 2080 Ti, pci bus id: 0000:02:00.0, compute capability: 7.5)
2021-08-04 04:11:54,062 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Serial augmentation enabled = False
2021-08-04 04:11:54,062 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Pseudo sharding enabled = False
2021-08-04 04:11:54,062 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Max Image Dimensions (all sources): (0, 0)
2021-08-04 04:11:54,062 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: number of cpus: 6, io threads: 12, compute threads: 6, buffered batches: 4
2021-08-04 04:11:54,063 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: total dataset size 280, number of sources: 1, batch size per gpu: 24, steps: 12
Traceback (most recent call last):
  File "/usr/local/bin/tlt-evaluate", line 8, in <module>
    sys.exit(main())
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/magnet_evaluate.py", line 57, in main
  File "<decorator-gen-2>", line 2, in main
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/utilities/timer.py", line 46, in wrapped_fn
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/evaluate.py", line 129, in main
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/evaluation/build_evaluator.py", line 102, in build_evaluator_for_trained_gridbox
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/dataloader/drivenet_dataloader.py", line 575, in get_dataset_tensors
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/core/build_wheel.runfiles/ai_infra/moduluspy/modulus/blocks/trainers/multi_task_trainer/data_loader_interface.py", line 77, in __call__
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/core/build_wheel.runfiles/ai_infra/moduluspy/modulus/blocks/data_loaders/multi_source_loader/data_loader.py", line 396, in call
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 1990, in apply
    return DatasetV1Adapter(super(DatasetV1, self).apply(transformation_func))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 1378, in apply
    dataset = transformation_func(self)
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/dataloader/drivenet_dataloader.py", line 311, in <lambda>
  File "/opt/nvidia/third_party/keras/tensorflow_backend.py", line 345, in new_map
    self, _map_func_set_random_wrapper, num_parallel_calls=num_parallel_calls
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 1909, in map
    MapDataset(self, map_func, preserve_cardinality=False))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 3434, in __init__
    use_legacy_function=use_legacy_function)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 2713, in __init__
    self._function = wrapper_fn._get_concrete_function_internal()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 1853, in _get_concrete_function_internal
    *args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 1847, in _get_concrete_function_internal_garbage_collected
    graph_function, _, _ = self._maybe_define_function(args, kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 2147, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 2038, in _create_graph_function
    capture_by_value=self._capture_by_value),
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/func_graph.py", line 915, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 2707, in wrapper_fn
    ret = _wrapper_helper(*args)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 2652, in _wrapper_helper
    ret = autograph.tf_convert(func, ag_ctx)(*nested_args)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/autograph/impl/api.py", line 237, in wrapper
    raise e.ag_error_metadata.to_exception(e)
StopIteration: in converted code:

    /opt/nvidia/third_party/keras/tensorflow_backend.py:342 _map_func_set_random_wrapper  *
        return map_func(*args, **kwargs)
    /home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/dataloader/drivenet_dataloader.py:126 __call__
        
    /home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/dataloader/drivenet_dataloader.py:100 _get_parse_example
        
    /home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/dataloader/utilities.py:217 extract_tfrecords_features
        

    StopIteration:

Morganh · August 4, 2021, 4:17am

Please delete the empty tfrecord files. If a file is 0 size, it is not expected.

H19012 · August 4, 2021, 4:23am

Fixed. Thanks.

system · October 3, 2021, 4:24am

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.