We got two processes training the model (3261216 and 3261217), perfect, but idk why there is another one (3260670) which its using all the gpus even 307MiB of the first one.
Another question related, In theory if I only pass device 13 and 14, Idk why tlt train doesn’t run with --gpu_index 0,1 instead of that it works with index 13 and 14.
Maybe all of these problem is because of -v path/to/docker:path/to/docker that I pass at the beginning Idk. I’d like to know if you can suggest another alternative because If I don’t pass docker path one, it appears another problem like this one.
When you login the docker via “docker exec -it tlt-leo_detectnet_v2 bash”, please run command like “detectnet_v2 train” instead of “tlt detectnet_v2 train”.
In notebook, we assume end user is running via tlt-launcher from host PC instead of inside the docker.
So if end user install the tlt-launcer, then run “tlt detectnet_v2 train” directly to trigger any task.
But currently, you already login the docker via “docker run xxx” and “docker exec xxx”, so please run the commands like “detectnet_v2 train xxx” etc.
It looks its working but I have problems to find the real path of all spec_file or the files.
When I ran with tlt detectnet_v2, I used tlt detectnet_v2 run bash to find the real path, but detectnet_v2 run bash is not working. What do you suggest?
there is an error about detectnet_v2_train_resnet18_kitti.txt not found. I tried many paths and not working. With tlt detectnet_v2 run bash I found it easily in the past.
Is there a way to find the correct path?
Can you run below command to check if the txt file is available?
$ tlt detectnet_v2 run ls /workspace/tlt-experiments/detectnet_v2/specs/detectnet_v2_train_resnet18_kitti.txt
Actually, I used local variables, for example, /home.nfs/detectnet_v2/specs_data/detectnet_v2_train_resnet18_kitti.txt, and now it works perfectly. I think that it is. I trained the model successfully.
Let me ask you something, since now, it’s not necessary to create mount points and define other variables, if you use a docker container right? Same thing to TLT 2.0 I suppose