Tao 3.22 kubernetes api train

Please provide the following information when requesting support.
RTX 3090
tao 3.22

{‘version’: ‘1’,
‘initial_epoch’: 0,
‘random_seed’: 42,
‘dataset_config’: {‘target_class_mapping’: [{‘key’: ‘person’,
‘value’: ‘person’}]},
‘training_config’: {‘batch_size_per_gpu’: 8,
‘num_epochs’: 1,
‘enable_qat’: False,
‘learning_rate’: {‘soft_start_annealing_schedule’: {‘min_learning_rate’: 4e-05,
‘max_learning_rate’: 0.015,
‘soft_start’: 0.1,
‘annealing’: 0.3}},
‘regularizer’: {‘type’: ‘L1’, ‘weight’: 2e-05},
‘checkpoint_interval’: 10,
‘n_workers’: 2,
‘optimizer’: {‘sgd’: {‘momentum’: 0.9, ‘nesterov’: False}}},
‘eval_config’: {‘average_precision_mode’: ‘SAMPLE’,
‘validation_period_during_training’: 10,
‘batch_size’: 8,
‘matching_iou_threshold’: 0.5},
‘nms_config’: {‘confidence_threshold’: 0.01,
‘clustering_iou_threshold’: 0.6,
‘top_k’: 200},
‘augmentation_config’: {‘output_width’: 1248,
‘output_height’: 384,
‘output_channel’: 3},
‘retinanet_config’: {‘aspect_ratios_global’: ‘[1.0, 2.0, 0.5]’,
‘two_boxes_for_ar1’: False,
‘clip_boxes’: False,
‘variances’: ‘[0.1, 0.1, 0.2, 0.2]’,
‘scales’: ‘[0.045, 0.09, 0.2, 0.4, 0.55, 0.7]’,
‘arch’: ‘resnet’,
‘nlayers’: 18,
‘freeze_bn’: False,
‘loss_loc_weight’: 0.8,
‘focal_loss_alpha’: 0.25,
‘focal_loss_gamma’: 2.0,
‘feature_size’: 256,
‘n_anchor_levels’: 1,
‘n_kernels’: 1}}

“id”: “75a4b05d-56f0-4d69-84c3-e7c60c88089c”,
“parent_id”: null,
“action”: “train”,
“created_on”: “2022-08-30T23:11:05.709947”,
“last_modified”: “2022-08-30T23:11:29.206582”,
“status”: “Done”,
“result”: {
“detailed_status”: {
“date”: “8/30/2022”,
“time”: “23:11:19”,
“status”: “FAILURE”,
“message”: “The index file /shared/users/645b9950-2947-4240-a438-2d58c9be1989/datasets/d289a44a-39b9-4ebc-b548-43bf8a684f57/tfrecords/idx-tfrecords-fold-000-of-002-shard-00000-of-00010 for /shared/users/645b9950-2947-4240-a438-2d58c9be1989/datasets/d289a44a-39b9-4ebc-b548-43bf8a684f57/tfrecords/tfrecords-fold-000-of-002-shard-00000-of-00010 does not exist.”
“categorical”: ,
“kpi”: ,
“graphical”: ,
“epoch”: null,
“max_epoch”: null,
“eta”: null
When using retinanet, a message is output.
i have tfrecords-fold-000-of-002-shard-00000-of-00010 file.
not idx-tfrecords-fold-000-of-002-shard-00000-of-00010.

when i changed file name idx-* in tfrecord.
error message is “ZeroDivisionError: float division by zero”

What command should I give to recognize the file name or antoher command?
Thank you

Please shed more light on the steps you have done.
Or you can follow Remote Client — TAO Toolkit 3.22.05 documentationhttps://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/resources/cv_samples/version/v1.4.1/files/api_tutorials/client/detectnet_v2.ipynb to check if the workflow can work.

The tfrecord generation is not the same between retinanet and detectnet_v2 network.
For Retinanet, it needs tfrecord files which should have idx.

So, please run retinanet dataset-convert to generate new tfrecords files.

1 Like

Thank you for your answer.
But how do i get generate new tfrecords for kubernetes api?

tao-client retinanet dataset-convert

Thank you.
But my tao-client version is tao-client, version 3.22.5b1

tao-client --help


There seems to be no other network_arch.
I have updated the tao-client to the latest version.

tao-client retinanet dataset-convert or tao-client retinanet

The command you gave me comes out like this.
Error: No such command ‘retinanet’.

Thank you for your reply.

There is no update from you for a period, assuming this is not an issue anymore.
Hence we are closing this topic. If need further support, please open a new one.

OK, ignore my previous command.
Can you follow https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/resources/cv_samples/version/v1.4.1/files/api_tutorials/client/detectnet_v2.ipynb#head-7
to generate tfrecords with id prefix?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.