Text Classification infer fails

Please provide the following information when requesting support.

• Hardware (T4)
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc)
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
tao info --verbose
Configuration of the TAO Toolkit Instance

dockers:
nvidia/tao/tao-toolkit-tf:
docker_registry: nvcr.io
docker_tag: v3.21.08-py3
tasks:

  1. augment
  2. bpnet
  3. classification
  4. detectnet_v2
  5. dssd
  6. emotionnet
  7. faster_rcnn
  8. fpenet
  9. gazenet
  10. gesturenet
  11. heartratenet
  12. lprnet
  13. mask_rcnn
  14. multitask_classification
  15. retinanet
  16. ssd
  17. unet
  18. yolo_v3
  19. yolo_v4
  20. converter
    nvidia/tao/tao-toolkit-pyt:
    docker_registry: nvcr.io
    docker_tag: v3.21.08-py3
    tasks:
  21. speech_to_text
  22. speech_to_text_citrinet
  23. text_classification
  24. question_answering
  25. token_classification
  26. intent_slot_classification
  27. punctuation_and_capitalization
    nvidia/tao/tao-toolkit-lm:
    docker_registry: nvcr.io
    docker_tag: v3.21.08-py3
    tasks:
  28. n_gram
    format_version: 1.0
    toolkit_version: 3.21.08
    published_date: 08/17/2021
    nvidia-smi -L
    GPU 0: Tesla T4 (UUID: GPU-9a3d7360-595d-cb85-a728-26f7058bc5c7)

nvidia-smi
Fri Aug 27 08:58:00 2021
±----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 On | 00000000:00:1E.0 Off | 0 |
| N/A 32C P8 10W / 70W | 0MiB / 15109MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+

docker version
Client: Docker Engine - Community
Version: 20.10.7
API version: 1.41
Go version: go1.13.15
Git commit: f0df350
Built: Wed Jun 2 11:56:40 2021
OS/Arch: linux/amd64
Context: default
Experimental: true

Server: Docker Engine - Community
Engine:
Version: 20.10.7
API version: 1.41 (minimum version 1.12)
Go version: go1.13.15
Git commit: b0f5bc3
Built: Wed Jun 2 11:54:48 2021
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.4.6
GitCommit: d71fcd7d8303cbf684402823e425e9dd2e99285d
runc:
Version: 1.0.0-rc95
GitCommit: b9ee9c6314599f1b4a7f497e1f1f856fe433d3b7
docker-init:
Version: 0.19.0
GitCommit: de40ad0
• Training spec file(If have, please share here)
infer.yaml
////////////////////////////////////

Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.

TLT Spec file for inference using a previously pretrained BERT model for a text classification task.

“Simulate” user input: batch with four samples.

input_batch:

  • “by the end of no such thing the audience , like beatrice , has a watchful affection for the monster .”
  • “director rob marshall went out gunning to make a great one .”
  • “uneasy mishmash of styles and genres .”
  • “I love exotic science fiction / fantasy movies but this one was very unpleasant to watch . Suggestions and images of child abuse , mutilated bodies (live or dead) , other gruesome scenes , plot holes , boring acting made this a regretable experience , The basic idea of entering another person’s mind is not even new to the movies or TV (An Outer Limits episode was better at exploring this idea) . i gave it 4 / 10 since some special effects were nice .”

• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)
ain,download_specs}
text_classification: error: the following arguments are required: -r/–results_dir
2021-09-01 10:25:43,076 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.
(taoenv) ubuntu@ip-172-31-14-240:~/tao$ tao text_classification infer -e /specs/nlp/text_classification/infer.yaml -r /results/nlp/text_classification/infer -m /results/nlp/text_classification/train/checkpoints/trained-model.tlt -g 1 -k $KEY
2021-09-01 10:27:38,950 [INFO] root: Registry: [‘nvcr.io’]
2021-09-01 10:27:39,055 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the “/home/ubuntu/.tao_mounts.json” file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
[NeMo W 2021-09-01 10:27:42 experimental:27] Module <class ‘nemo.collections.nlp.modules.common.megatron.megatron_bert.MegatronBertEncoder’> is experimental, not ready for production and is not fully supported. Use at your own risk.
[NeMo W 2021-09-01 10:27:45 experimental:27] Module <class ‘nemo.collections.nlp.modules.common.megatron.megatron_bert.MegatronBertEncoder’> is experimental, not ready for production and is not fully supported. Use at your own risk.
[NeMo I 2021-09-01 10:27:46 tlt_logging:20] Experiment configuration:
restore_from: /results/nlp/text_classification/train/checkpoints/trained-model.tlt
exp_manager:
task_name: infer
explicit_log_dir: /results/nlp/text_classification/infer
input_batch:

  • by the end of no such thing the audience , like beatrice , has a watchful affection
    for the monster .
  • director rob marshall went out gunning to make a great one .
  • uneasy mishmash of styles and genres .
  • I love exotic science fiction / fantasy movies but this one was very unpleasant
    to watch . Suggestions and images of child abuse , mutilated bodies (live or dead)
    , other gruesome scenes , plot holes , boring acting made this a regretable experience
    , The basic idea of entering another person’s mind is not even new to the movies
    or TV (An Outer Limits episode was better at exploring this idea) . i gave it 4
    / 10 since some special effects were nice .
    encryption_key: ‘*****’

[NeMo W 2021-09-01 10:27:46 exp_manager:26] Exp_manager is logging to `/results/nlp/text_classification/infer``, but it already exists.
[NeMo W 2021-09-01 10:27:48 modelPT:193] Using /tmp/tmptv7d8fn6/tokenizer.vocab_file instead of tokenizer.vocab_file.
Using bos_token, but it is not set yet.
Using eos_token, but it is not set yet.
[NeMo W 2021-09-01 10:27:48 modelPT:1202] World size can only be set by PyTorch Lightning Trainer.
Traceback (most recent call last):
File “/opt/conda/lib/python3.8/site-packages/hydra/_internal/utils.py”, line 198, in run_and_report
return func()
File “/opt/conda/lib/python3.8/site-packages/hydra/_internal/utils.py”, line 347, in
lambda: hydra.run(
File “/opt/conda/lib/python3.8/site-packages/hydra/_internal/hydra.py”, line 107, in run
return run_job(
File “/opt/conda/lib/python3.8/site-packages/hydra/core/utils.py”, line 127, in run_job
ret.return_value = task_function(task_cfg)
File “/tlt-nemo/nlp/text_classification/scripts/infer.py”, line 83, in main
File “/opt/conda/lib/python3.8/posixpath.py”, line 142, in basename
p = os.fspath(p)
TypeError: expected str, bytes or os.PathLike object, not NoneType

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/tlt-nemo/nlp/text_classification/scripts/infer.py”, line 113, in
File “/opt/conda/lib/python3.8/site-packages/nemo/core/config/hydra_runner.py”, line 98, in wrapper
_run_hydra(
File “/opt/conda/lib/python3.8/site-packages/hydra/_internal/utils.py”, line 346, in _run_hydra
run_and_report(
File “/opt/conda/lib/python3.8/site-packages/hydra/_internal/utils.py”, line 237, in run_and_report
assert mdl is not None
AssertionError
2021-09-01 10:27:58,400 [INFO] tlt.components.docker_handler.docker_handler: Stopping container

Please check the link you want to run inference.

Not clear… which link are we talking about. I am just running the command ’ Tao text_classification infer…’

Check in the logs please. I am not using any link.

tao text_classification infer -e /specs/nlp/text_classification/infer.yaml -r /results/nlp/text_classification/infer -m /results/nlp/text_classification/train/checkpoints/trained-model.tlt -g 1 -k $KEY

Sorry, I mean the batch you want to inference.
Could you attach the .yaml file here?

infer.yaml
////////////////////////////////////

Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.

TLT Spec file for inference using a previously pretrained BERT model for a text classification task.

“Simulate” user input: batch with four samples.

input_batch:

  • “by the end of no such thing the audience , like beatrice , has a watchful affection for the monster .”
  • “director rob marshall went out gunning to make a great one .”
  • “uneasy mishmash of styles and genres .”
  • “I love exotic science fiction / fantasy movies but this one was very unpleasant to watch . Suggestions and images of child abuse , mutilated bodies (live or dead) , other gruesome scenes , plot holes , boring acting made this a regretable experience , The basic idea of entering another person’s mind is not even new to the movies or TV (An Outer Limits episode was better at exploring this idea) . i gave it 4 / 10 since some special effects were nice .”

Can you run below successfully? Please share the log.

tao text_classification run ls /results/nlp/text_classification/train/checkpoints/trained-model.tlt

Will try and revert

What is the command suppose to do?

Here is the log:
tao text_classification run ls /results/nlp/text_classification/train/checkpoints/trained-model.tlt
2021-09-02 13:22:33,831 [INFO] root: Registry: [‘nvcr.io’]
2021-09-02 13:22:34,200 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the “/home/ubuntu/.tao_mounts.json” file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
/results/nlp/text_classification/train/checkpoints/trained-model.tlt
2021-09-02 13:22:36,441 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

To check if the tlt model is available inside the docker.

Please add below under “model:” of your yaml file.

class_labels:
class_labels_file : null # optional to specify a file containing the list of the labels

Made changes to infer.yaml
Getting this error
2021-09-03 04:23:11,236 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the “/home/ubuntu/.tao_mounts.json” file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
[NeMo W 2021-09-03 04:23:20 experimental:27] Module <class ‘nemo.collections.nlp.modules.common.megatron.megatron_bert.MegatronBertEncoder’> is experimental, not ready for production and is not fully supported. Use at your own risk.
[NeMo W 2021-09-03 04:23:24 experimental:27] Module <class ‘nemo.collections.nlp.modules.common.megatron.megatron_bert.MegatronBertEncoder’> is experimental, not ready for production and is not fully supported. Use at your own risk.
Error merging ‘infer.yaml’ with schema
Key ‘model’ not in ‘DefaultConfig’
full_key: model
reference_type=Optional[Dict[Union[str, Enum], Any]]
object_type=DefaultConfig

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
2021-09-03 04:23:25,919 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

Could you please download official released jupyter notebook and refer to its infer.yaml ?