So, do you mean the tlt-infer is also fine at that time?
Yes. The first time I did this procedure I just followed exactly the Jupyter notebook. I even deployed the model to a Jetson Nano. Then I tried again with Resnet10. That also worked. It was only when I moved onto modifying the files and my own data set that things went wrong.
My default images are 1392x512px. Perhaps I have not specified the image_width and image_height correctly? I specified 1408 x 544. I wasn’t exactly sure how to set the image_width and image_height for training - but it did train.
Thanks for the info.
So, please set to 1392x512 and trigger training.
Hi. I started there but training never started. I’ve done it again to be sure and included the log below.
training_log(1392x512).txt (16.8 KB)
Please paste your training spec.
Training spec with modified image_width/height attached.
When you change back to 1408 x 544, can it run?
More, when you run a new training, please delete previous result folder or specify a new result folder.
Hi. I ran the training again and ensured the experiment_dir_unpruned was removed first. Have attached screenshots of before and after training at 1408x544 (above the actual image size)
I am happy to forward you the model and/or dataset if you feel that may help. There is no sensitive information in the dataset.
Thanks again for your interest and support.
Yes, Nezakka. I need to check if I can reproduce on my side.
Please share your model and dataset. If you do not want to show public info, you can just send me a private forum message.
Hi Morganh. I’m not sure how to send a private forum message, but I don’t think that many people will be interested anyway :)
Here’s a dropbox link. Two zip files. One is the generated model and one is the dataset. If you need more files please let me know.
Again, many thanks for the support :)
Could you please share it with google drive? Sorry for the inconvenient.
Oh. I will try go get google drive. That is a public link though. You should just be able to click and do a direct download through the web.
I try dropbox but seems that I still get stuck after waiting for a while.
Understand. Let me try to set up google drive now.
I can reproduce the tlt-infer error with your dataset, but with the same trained tlt model, cannnot reproduce with other dataset.
Thank you so much. I really appreciate the effort.
I convert one test png into jpg via https://convertio.co/, it works. Then change it back to png, the tlt-infer also works now.
So , it is related to image. Please try it on your side.
Further, you can use “$ identify -verbose xxx.png” to check the difference.
Well obviously my model needs quite a bit of work, but your have provided a solution to my problem. Thank you very much.
There were many differences once the files were converted to .jpg and back to .png. The boxed image file attached is an original .jpg.
The dataset was created using an action camera intervalometer, splitting up the .mpg into individual .jpg in adobe photoshop then doing some automatic adjustments in lightroom and creating the .png. Following that images were tagged with fuji/alps. I will do more investigation, but it seems the problem was with the .png export from lightroom. If I can determine the exact problem, I’ll let you know.
I’ve also attached the verbose output from identify of the original .png file and the one that was converted to .jpg then back to .png (twice converted) in case you would like to compare.