i wrote a small test program, in which i copied the same array using cudaMemcpy from the host to the device within a pthread. The addresses and size is constant and nothing is changing.
The main is just starting the thread and waits for it.
Sadly this program is slowly consuming more and more ram memory.
On a normal workstation everything is fine, but on the Jetson Xavier, Nano or TX2 this is happening.
Currently i am on the Xavier with Jetpack 4.2.1.
I found this behavior in the darknet demo from alexeyab and traced it back to this simple case.
It is more prominent in the bigger scenario and leads to freeze the hole system in the end.
Anyone already discovered someting like this?
Kind regards, Nils