Please provide the following information when requesting support.
• Hardware (T4/V100/Xavier/Nano/etc): GeForce RTX 3090
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc): Detectnet_v2
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
tao info --verbose
:
Configuration of the TAO Toolkit Instance
dockers:
nvidia/tao/tao-toolkit-tf:
v3.22.05-tf1.15.5-py3:
docker_registry: nvcr.io
tasks:
1. augment
2. bpnet
3. classification
4. dssd
5. faster_rcnn
6. emotionnet
7. efficientdet
8. fpenet
9. gazenet
10. gesturenet
11. heartratenet
12. lprnet
13. mask_rcnn
14. multitask_classification
15. retinanet
16. ssd
17. unet
18. yolo_v3
19. yolo_v4
20. yolo_v4_tiny
21. converter
v3.22.05-tf1.15.4-py3:
docker_registry: nvcr.io
tasks:
1. detectnet_v2
nvidia/tao/tao-toolkit-pyt:
v3.22.05-py3:
docker_registry: nvcr.io
tasks:
1. speech_to_text
2. speech_to_text_citrinet
3. speech_to_text_conformer
4. action_recognition
5. pointpillars
6. pose_classification
7. spectro_gen
8. vocoder
v3.21.11-py3:
docker_registry: nvcr.io
tasks:
1. text_classification
2. question_answering
3. token_classification
4. intent_slot_classification
5. punctuation_and_capitalization
nvidia/tao/tao-toolkit-lm:
v3.22.05-py3:
docker_registry: nvcr.io
tasks:
1. n_gram
format_version: 2.0
toolkit_version: 3.22.05
published_date: 05/25/2022
• Training spec file(If have, please share here)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)
I’m following the instructions in TAO Quick Start Guide, and I was able to train detectnet_v2 in cv_samples_v1.4.0 when using my computer with GeForce GTX 1660 SUPER
. However, when I uses another computer with GeForce RTX 3090
, I was stuck in executing the command !tao detectnet_v2 dataset_convert
and here is the log:
Converting Tfrecords for kitti trainval dataset
2022-08-15 03:36:57,503 [INFO] root: Registry: ['nvcr.io']
2022-08-15 03:36:57,581 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.22.05-tf1.15.4-py3
Matplotlib created a temporary config/cache directory at /tmp/matplotlib-rueeond1 because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
Using TensorFlow backend.
I don’t know what happened, since there isn’t other useful informations for me to deal with it.
Here is the result of nvidia-smi
when executing the !tao detectnet_v2 dataset_convert
:
(I’m pretty sure that the docker is able to use GPU)
Mon Aug 15 04:05:18 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01 Driver Version: 515.65.01 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:00:04.0 Off | N/A |
| 0% 58C P8 18W / 350W | 1MiB / 24576MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
PS. I had also tried to use Triton Server and again it can run on 1660 but fail to run on 3090
So is there anything else I can notice?