Text Classification infer fails

meravleen · September 1, 2021, 3:52pm

Please provide the following information when requesting support.

• Hardware (T4)
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc)
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
tao info --verbose
Configuration of the TAO Toolkit Instance

dockers:
nvidia/tao/tao-toolkit-tf:
docker_registry: nvcr.io
docker_tag: v3.21.08-py3
tasks:

augment
bpnet
classification
detectnet_v2
dssd
emotionnet
faster_rcnn
fpenet
gazenet
gesturenet
heartratenet
lprnet
mask_rcnn
multitask_classification
retinanet
ssd
unet
yolo_v3
yolo_v4
converter
nvidia/tao/tao-toolkit-pyt:
docker_registry: nvcr.io
docker_tag: v3.21.08-py3
tasks:
speech_to_text
speech_to_text_citrinet
text_classification
question_answering
token_classification
intent_slot_classification
punctuation_and_capitalization
nvidia/tao/tao-toolkit-lm:
docker_registry: nvcr.io
docker_tag: v3.21.08-py3
tasks:
n_gram
format_version: 1.0
toolkit_version: 3.21.08
published_date: 08/17/2021
nvidia-smi -L
GPU 0: Tesla T4 (UUID: GPU-9a3d7360-595d-cb85-a728-26f7058bc5c7)

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+

docker version
Client: Docker Engine - Community
Version: 20.10.7
API version: 1.41
Go version: go1.13.15
Git commit: f0df350
Built: Wed Jun 2 11:56:40 2021
OS/Arch: linux/amd64
Context: default
Experimental: true

Server: Docker Engine - Community
Engine:
Version: 20.10.7
API version: 1.41 (minimum version 1.12)
Go version: go1.13.15
Git commit: b0f5bc3
Built: Wed Jun 2 11:54:48 2021
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.4.6
GitCommit: d71fcd7d8303cbf684402823e425e9dd2e99285d
runc:
Version: 1.0.0-rc95
GitCommit: b9ee9c6314599f1b4a7f497e1f1f856fe433d3b7
docker-init:
Version: 0.19.0
GitCommit: de40ad0
• Training spec file(If have, please share here)
infer.yaml
////////////////////////////////////

Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.

TLT Spec file for inference using a previously pretrained BERT model for a text classification task.

“Simulate” user input: batch with four samples.

input_batch:

“by the end of no such thing the audience , like beatrice , has a watchful affection for the monster .”
“director rob marshall went out gunning to make a great one .”
“uneasy mishmash of styles and genres .”
“I love exotic science fiction / fantasy movies but this one was very unpleasant to watch . Suggestions and images of child abuse , mutilated bodies (live or dead) , other gruesome scenes , plot holes , boring acting made this a regretable experience , The basic idea of entering another person’s mind is not even new to the movies or TV (An Outer Limits episode was better at exploring this idea) . i gave it 4 / 10 since some special effects were nice .”

• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)
ain,download_specs}
text_classification: error: the following arguments are required: -r/–results_dir
2021-09-01 10:25:43,076 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.
(taoenv) ubuntu@ip-172-31-14-240:~/tao$ tao text_classification infer -e /specs/nlp/text_classification/infer.yaml -r /results/nlp/text_classification/infer -m /results/nlp/text_classification/train/checkpoints/trained-model.tlt -g 1 -k $KEY
2021-09-01 10:27:38,950 [INFO] root: Registry: [‘nvcr.io’]
2021-09-01 10:27:39,055 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the “/home/ubuntu/.tao_mounts.json” file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
[NeMo W 2021-09-01 10:27:42 experimental:27] Module <class ‘nemo.collections.nlp.modules.common.megatron.megatron_bert.MegatronBertEncoder’> is experimental, not ready for production and is not fully supported. Use at your own risk.
[NeMo W 2021-09-01 10:27:45 experimental:27] Module <class ‘nemo.collections.nlp.modules.common.megatron.megatron_bert.MegatronBertEncoder’> is experimental, not ready for production and is not fully supported. Use at your own risk.
[NeMo I 2021-09-01 10:27:46 tlt_logging:20] Experiment configuration:
restore_from: /results/nlp/text_classification/train/checkpoints/trained-model.tlt
exp_manager:
task_name: infer
explicit_log_dir: /results/nlp/text_classification/infer
input_batch:

by the end of no such thing the audience , like beatrice , has a watchful affection
for the monster .
director rob marshall went out gunning to make a great one .
uneasy mishmash of styles and genres .
I love exotic science fiction / fantasy movies but this one was very unpleasant
to watch . Suggestions and images of child abuse , mutilated bodies (live or dead)
, other gruesome scenes , plot holes , boring acting made this a regretable experience
, The basic idea of entering another person’s mind is not even new to the movies
or TV (An Outer Limits episode was better at exploring this idea) . i gave it 4
/ 10 since some special effects were nice .
encryption_key: ‘*****’

[NeMo W 2021-09-01 10:27:46 exp_manager:26] Exp_manager is logging to `/results/nlp/text_classification/infer``, but it already exists.
[NeMo W 2021-09-01 10:27:48 modelPT:193] Using /tmp/tmptv7d8fn6/tokenizer.vocab_file instead of tokenizer.vocab_file.
Using bos_token, but it is not set yet.
Using eos_token, but it is not set yet.
[NeMo W 2021-09-01 10:27:48 modelPT:1202] World size can only be set by PyTorch Lightning Trainer.
Traceback (most recent call last):
File “/opt/conda/lib/python3.8/site-packages/hydra/_internal/utils.py”, line 198, in run_and_report
return func()
File “/opt/conda/lib/python3.8/site-packages/hydra/_internal/utils.py”, line 347, in
lambda: hydra.run(
File “/opt/conda/lib/python3.8/site-packages/hydra/_internal/hydra.py”, line 107, in run
return run_job(
File “/opt/conda/lib/python3.8/site-packages/hydra/core/utils.py”, line 127, in run_job
ret.return_value = task_function(task_cfg)
File “/tlt-nemo/nlp/text_classification/scripts/infer.py”, line 83, in main
File “/opt/conda/lib/python3.8/posixpath.py”, line 142, in basename
p = os.fspath(p)
TypeError: expected str, bytes or os.PathLike object, not NoneType

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/tlt-nemo/nlp/text_classification/scripts/infer.py”, line 113, in
File “/opt/conda/lib/python3.8/site-packages/nemo/core/config/hydra_runner.py”, line 98, in wrapper
_run_hydra(
File “/opt/conda/lib/python3.8/site-packages/hydra/_internal/utils.py”, line 346, in _run_hydra
run_and_report(
File “/opt/conda/lib/python3.8/site-packages/hydra/_internal/utils.py”, line 237, in run_and_report
assert mdl is not None
AssertionError
2021-09-01 10:27:58,400 [INFO] tlt.components.docker_handler.docker_handler: Stopping container

Morganh · September 2, 2021, 9:20am

Please check the link you want to run inference.

meravleen · September 2, 2021, 11:11am

Not clear… which link are we talking about. I am just running the command ’ Tao text_classification infer…’

Check in the logs please. I am not using any link.

meravleen · September 2, 2021, 11:12am

tao text_classification infer -e /specs/nlp/text_classification/infer.yaml -r /results/nlp/text_classification/infer -m /results/nlp/text_classification/train/checkpoints/trained-model.tlt -g 1 -k $KEY

Morganh · September 2, 2021, 11:19am

Sorry, I mean the batch you want to inference.
Could you attach the .yaml file here?

meravleen · September 2, 2021, 11:39am

infer.yaml
////////////////////////////////////

Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.

TLT Spec file for inference using a previously pretrained BERT model for a text classification task.

“Simulate” user input: batch with four samples.

input_batch:

“by the end of no such thing the audience , like beatrice , has a watchful affection for the monster .”
“director rob marshall went out gunning to make a great one .”
“uneasy mishmash of styles and genres .”
“I love exotic science fiction / fantasy movies but this one was very unpleasant to watch . Suggestions and images of child abuse , mutilated bodies (live or dead) , other gruesome scenes , plot holes , boring acting made this a regretable experience , The basic idea of entering another person’s mind is not even new to the movies or TV (An Outer Limits episode was better at exploring this idea) . i gave it 4 / 10 since some special effects were nice .”

Morganh · September 2, 2021, 12:16pm

Can you run below successfully? Please share the log.

tao text_classification run ls /results/nlp/text_classification/train/checkpoints/trained-model.tlt

meravleen · September 2, 2021, 1:16pm

Will try and revert

meravleen · September 2, 2021, 1:23pm

What is the command suppose to do?

meravleen · September 2, 2021, 1:24pm

Here is the log:
tao text_classification run ls /results/nlp/text_classification/train/checkpoints/trained-model.tlt
2021-09-02 13:22:33,831 [INFO] root: Registry: [‘nvcr.io’]
2021-09-02 13:22:34,200 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the “/home/ubuntu/.tao_mounts.json” file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
/results/nlp/text_classification/train/checkpoints/trained-model.tlt
2021-09-02 13:22:36,441 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

Morganh · September 2, 2021, 1:24pm

To check if the tlt model is available inside the docker.

Morganh · September 2, 2021, 1:42pm

Please add below under “model:” of your yaml file.

class_labels:
class_labels_file : null # optional to specify a file containing the list of the labels

meravleen · September 3, 2021, 4:25am

Made changes to infer.yaml
Getting this error
2021-09-03 04:23:11,236 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the “/home/ubuntu/.tao_mounts.json” file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
[NeMo W 2021-09-03 04:23:20 experimental:27] Module <class ‘nemo.collections.nlp.modules.common.megatron.megatron_bert.MegatronBertEncoder’> is experimental, not ready for production and is not fully supported. Use at your own risk.
[NeMo W 2021-09-03 04:23:24 experimental:27] Module <class ‘nemo.collections.nlp.modules.common.megatron.megatron_bert.MegatronBertEncoder’> is experimental, not ready for production and is not fully supported. Use at your own risk.
Error merging ‘infer.yaml’ with schema
Key ‘model’ not in ‘DefaultConfig’
full_key: model
reference_type=Optional[Dict[Union[str, Enum], Any]]
object_type=DefaultConfig

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
2021-09-03 04:23:25,919 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

Morganh · September 3, 2021, 5:34pm

Could you please download official released jupyter notebook and refer to its infer.yaml ?