google.protobuf.text_format.ParseError: 60:3 : ' }': Couldn't parse float

Please provide the following information when requesting support.

• Hardware (RGX 3080)
• Network Type (Classification)
• TLT Version format_version: 2.0
toolkit_version: 3.22.05
published_date: 05/25/2022()
• Training spec file
• How to reproduce the issue ?

When I run “!tao classification train -e $LOCAL_SPECS_DIR -r $LOCAL_EXPERIMENT_DIR/classification -k $KEY”

The container stops with “google.protobuf.text_format.ParseError: 60:3 : ’ }': Couldn’t parse float: }”
Looking at my experiment spec file I changed the adam optimizer epsilon from ‘1e-7’ to ‘0.00000001’, but that did not change the ParseError.


Can you offer any suggestions?

Can you change back to 1e-7 to check if it works?

It doesn’t work. (I originally changed it from 1e-7 to see if that was the problem)

Please add “}” in the end.
I find that you did not set “}” in the last eval_config.

You’re right.
That is strange, because the “}” is there in my text editor.


I added an empty line to the end and now the “}” shows

but when I run the notebook I still get:
“google.protobuf.text_format.ParseError: 60:3 : ’ }': Couldn’t parse float: }”

I am afraid there are some unexpected hidden characters in your spec file.
Suggest you to check further.

Or you can copy the spec file in the notebook and then modify each parameter to your expected.

Hi Morganh,

some of your language is slightly ambiguous:

There doesn’t appear to be a spec file in the notebook, so do you mean copy the spec file into the notebook? I am not very comfortable with that idea.

If I did do that, would I copy it from:

and exactly where in the notebook would I copy it to in order for it to work the way it is meant to?

Please refer to TAO Toolkit Quick Start Guide — TAO Toolkit 3.22.05 documentation
You can download notebook via

wget --content-disposition https://api.ngc.nvidia.com/v2/resources/nvidia/tao/cv_samples/versions/v1.4.1/zip -O cv_samples_v1.4.1.zip
unzip -u cv_samples_v1.4.1.zip  -d ./cv_samples_v1.4.1 && rm -rf cv_samples_v1.4.1.zip && cd ./cv_samples_v1.4.1

There should be a sample spec file for classification network.

Thank you for this.

I have completely rebuilt the training_spec document. The problem seems to have occurred when I made it a .json, at which point sections of the text turn red (in MS Visual Studio). I have therefore made it a .cfg, which makes the text revert to white . I have also gone back through the Jupyter notebook and changed all mentions of training_spec.json to training_spec.cfg.

I currently have a permissions issue which is preventing the cell from completing and so I cannot confirm that this issue is solved.

You can open a terminal and try to save the spec file.
Then open jupter notebook again.

Thank you Morganh.

I am unable to access my machine until September.

Hello @pddarrell Do you have any updates on this topic?

Thank you for checking in Yingliu.

I am resuming the project today and I hope to get back
to you before the end of the day (currently ~ 09.00am here).

Many thanks

Hi again.

The permissions error I have persists even when I change “root:root” permissions to “peter:peter”


This occurs when I run “-r /home/peter/TAO_toolkit/results” in the “tao classification train” command. Since the above permission change I have also run “sudo chmod 771 results” to give the peter group read, write and execute permissions but this does not change the permissions error.
I have also tried restarting my machine (as a sanity check).
As a result I am currently unable to run the notebook. Please advise.

Can you share your ~/.tao_mounts.json ?

Also, please check if below can help you.
Please try to remove the following from the ~/.tao_mounts.json to check if it works.

    "DockerOptions": {
        "user": "1000:1000"

Reference: Permission Denied Error When training MASK RCNN - #12 by subhankar.halder

Also, please check if below can help you.
Please try to remove the following from the ~/.tao_mounts.json to check if it works.

    "DockerOptions": {
        "user": "1000:1000"

Here it is in the bash:


Here is the double quote error it throws:

Here is the same thing in MS Visual Studio used to check what “line 17” is

Line 17 is the final curly brace.

I have restored the following to ~/.tao_mounts at present
“DockerOptions”: {
“user”: “1000:1000”

Change line 15
],
to
]

and retry.

Apologies for the “,”


It throws a different Errno (2, instead if 13), but it is still failing to find ‘/home/peter/TAO_toolkit/data/train’

All the path in the command line (for example, $ tao classification train xxx ) should be inside the docker. The xxx is the path inside the docker, i.e., as below.

image

I have been using the format as described here:


and, as a result my command line shows this:

Are you saying that I am following the wrong instructions?
(I already completed the tao_voc exercise)