My inferencing is stopped without any error

I am using jetson.inference and jetson.utils, I trained my model, and inference like imagenet.py.

The codes run well in the beginning, and could output classifications well.

But after a few hours (maybe around two hours), the scripts are still looks like running, but no any messages are outputted, and no errors outputted.

Could you please help to fix the issue?

Or any ways to get any clue?

I am running the scripts like: nohup python3 detect.py > detect.log 2>&1 &

The script could run around 6 hours when using 1 model, and without using python class.

It could run around 2 hours when using 2 models, and using python class.

Thank you.

Hi,

Do you get some output log on the console?
If yes, would you mind sharing it with us first?

Thanks.

Just now, I got an message below, and then the script was stopped:

class 0000 - 0.994136 (0Ignor
)
class 0000 - *** stack smashing detected ***: terminated

Hi,

It seems that you are using an RTSP source.
Would you mind testing this issue with a CSI or USB camera as well?

Thanks.

It is getting worse.
Now, when running 2 models, it can only run for 5 minutes, and then no any outputs or messages.
1 model can run for 20 minutes, (before it was 6 hours, and then 3 hours, I killed it in every 3 hour to make it work, for now it works well. But I am not sure if it can run longer.)

I am running the 1 model and 2 models at same time, connected to same RTSP cameras.
1 model runs on Nano, connected 2 cameras (A B).
2 models runs on NX, connected 2 same cameras (A B).
the same 2 models runs on TX2, connected 3 another cameras (C D E).

Now 5 minutes is very hard…

Any suggestions?
Thank you.

Hi,

We want to reproduce this issue internally and check it deeper.
Would you mind sharing the source and detailed steps for reproducing?

Thanks.

I think the reason could be:

The inferencing works all the time, maybe there are resources leaks?
Then the script could not run long.

I kill the script using sh, that increase the resources leaks?
Then the script run shorter and shorter?

After I kill the script, is there any way to clean these resources leaks?
Are there any tools to monitor the resources usage?

Thank you.

Hi,

If you terminated the script, the occupied memory should release immediately.
You can monitor the system status with tegrastats directly.

$ sudo tegrastats

Thanks.

Thank You. I monitor the resources, found they looks good.
At the beginning, when detection works well, the GPU is used.

After a few minutes, the detection is stopped working, the GPU usage is less.

Based on the monitor, there looks like no resource leaks.
I do not know what is causing the inference stop.
Any suggestions? Thank you.

Could you please let me know your email address, let me send the codes to you.

Hi,

Would you mind sharing it through a private message?
More, does this issue also occur with the jetson-inference default sample and default model?

Thanks.