Please provide the following information when requesting support.
Hardware - GPU (A100/A30/T4/V100)
Hardware - CPU
Operating System
Riva Version
TLT Version (if relevant)
How to reproduce the issue ? (This is for errors. Please share the command and the detailed log here)
Hi,
I am using Riva Bot Maker. I am training intent slot classification model using the bash script:
> bash run_jarvis_domain_builder.sh train domain_model \
> --dataset_path datasets/business_enquiry/business_enquiry.yaml \
> --result_path models\
> --version 1 \
> --epochs 10 \
> --domain business_enquiry \
> --batch_size 1
I am getting the following error:
2022-07-27 09:38:23,536 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.
Exception from training module: Missing TLT model checkpoint, Training Failed
Failed to train model with error Missing TLT model checkpoint, Training Failed
Domain Builder CLI failed with error Missing TLT model checkpoint, Training Failed
Exiting DomainBuilder Container
I am following all the procedures like pulling the appropriate containers.
GPU- Tesla T4
Tlt_pytorch_image=“nvcr.io/nvidia/tlt-pytorch:v3.0-dp-py3”
Operating System =
NAME="Ubuntu"
VERSION="18.04.6 LTS (Bionic Beaver)"
Thanks