Script killed

mme_ch · June 19, 2019, 7:54pm

I am training a neural network on a Nano using Python 3.6 and CUDA. However, my process gets killed. If I run the same code on OS X, the script works fine.

device = torch.device("cuda" if (torch.cuda.is_available() and in_args.gpu == "gpu") else "cpu")

I get the output below when monitoring performance with tegrastats (1000ms interval). I think the 64GB swap file works as well as CUDA & the GPUs.

Any suggestions what I am missing here? Do I have to assign memory to the GPU (similar to the Link below)?
https://stackoverflow.com/questions/48285308/killed-error-in-tensorflow-when-i-try-load-convolutional-pretrained-model-in-jet

tegrastat output in the middle of job:
RAM 3478/3957MB (lfb 95x4MB) SWAP 2130/65536MB (cached 4MB) CPU [24%@1428,25%@1428,18%@1428,18%@1428] EMC_FREQ 0% GR3D_FREQ 33% PLL@46.5C CPU@49.5C PMIC@100C GPU@48C AO@55C thermal@49.5C POM_5V_IN 2670/3214 POM_5V_GPU 82/58

tegrastat output just before the job gets killed:
RAM 1391/3957MB (lfb 128x4MB) SWAP 653/65536MB (cached 28MB) CPU [7%@102,7%@102,7%@102,2%@102] EMC_FREQ 0% GR3D_FREQ 0% PLL@46C CPU@49C PMIC@100C GPU@48.5C AO@54.5C thermal@48.75C POM_5V_IN 1614/3197 POM_5V_GPU 41/59 POM_5V_CPU 165/878
RAM 1391/3957MB (lfb 128x4MB) SWAP 653/65536MB (cached 28MB) CPU [10%@1428,7%@1428,6%@1428,5%@1428] EMC_FREQ 0% GR3D_FREQ 90% PLL@47C CPU@50.5C PMIC@100C GPU@48C AO@54.5C thermal@48.5C POM_5V_IN 3317/3197 POM_5V_GPU 286/59 POM_5V_CPU 899/878
RAM 1392/3957MB (lfb 128x4MB) SWAP 653/65536MB (cached 28MB) CPU [7%@102,8%@102,7%@102,4%@102] EMC_FREQ 0% GR3D_FREQ 0% PLL@46C CPU@49C PMIC@100C GPU@48.5C AO@54.5C thermal@49C POM_5V_IN 1573/3194 POM_5V_GPU 41/59 POM_5V_CPU 165/877

AastaLLL · June 20, 2019, 7:22am

Hi

Would you mind to retry it without adding the swap file?

The swap memory is only accessible by the CPU.
However, Jetson’s memory is shared so there is some possibility that the swap memory is used as GPU memory.

It’s recommended to try if this issue still remains without using swap memory first.
But this may require you to decrease the batchsize to feed the model into native memory.

Thanks.

mdegans · June 20, 2019, 3:30pm

Yikes. It there a way to prevent swap being used for the gpu other than disabling swap entirely (like when the memory is allocated)?

arne.caspari · June 20, 2019, 4:16pm

@mdegans:

The process could mlock the memory. see: mlock(2) - Linux manual page

mme_ch · June 21, 2019, 4:32am

Thank you for your feedback.

I removed the swap file and its references in the fstab & sysctl.conf. I reduced the batch size from 64 to 8. However, the job still gets killed.

How can I be sure that the way I implemented CUDA in my code really transfers the data to the GPU?
When looking at the tegrastats output, I would have expected that the GPU load (GR3D_FREQ) is constantly high (there are times when it is 0%)?
Any other suggestions?

mdegans · June 21, 2019, 6:15am

Thank you. That will probably work for my purposes if I run into this.

mme_ch · June 21, 2019, 8:02pm

So I guess I am asking too much from the Nano. I turned back on swap so some non GPU stuff may be swapped, I reduce the batch size and will see how far this will get me.

Is it possible to implement mlock in a Python script? The answer in the post below states that this is not possible in Python?
https://stackoverflow.com/questions/32820862/mlock-a-variable-in-python

arne.caspari · June 22, 2019, 9:28am

@mme_ch

If you have the memory pointer, you could use CTypes to call mlock and prevent swapping for this memory. The post you linked only states that it is not possible to lock a python object.

But first: Did you make sure it really is an Out-Of-Memory kill and not a segfault or something? IIRC, OOM kills are logged in the dmesg output. If it is an OOM kill, deactivating swap would only make it worse, not better. I would think the CUDA library makes sure that memory pages shared between the GPU and CPU are prevented from swapping. But if the library does not take care of this, the result would be incorrect computations or segmentation faults since the GPU would work on incorrect memory.

What is the actual error message that comes up when the script gets killed?

mme_ch · June 23, 2019, 7:43pm

@arne.caspari

Thanks for input.

I started a job with GPU and a small batch size. I got dmesg output below (the last couple of lines). How do I interpret this?

[ 3730.330187] [ 7422] 1000 7422 3392233 86095 1677 10 482599 0 python3
[ 3730.330191] [ 7709] 1000 7709 215319 33 105 5 30402 0 gnome-software
[ 3730.330195] [ 7712] 1000 7712 101973 0 33 4 1232 0 update-notifier
[ 3730.330199] [ 7850] 0 7850 92409 2 32 5 1914 0 fwupd
[ 3730.330202] [ 8199] 1000 8199 127390 0 53 5 1475 0 deja-dup-monito
[ 3730.330207] Out of memory: Kill process 7422 (python3) score 220 or sacrifice child
[ 3730.337926] Killed process 7422 (python3) total-vm:13568932kB, anon-rss:0kB, file-rss:344380kB, shmem-rss:0kB
[ 3730.571116] oom_reaper: reaped process 7422 (python3), now anon-rss:0kB, file-rss:344676kB, shmem-rss:0kB

I will next start a job with large batch size to check what the dmesg output will be.

mme_ch · June 23, 2019, 8:33pm

@arne.caspari

With large batch size I get the same dgmes output as above.

The actual error message in the terminal is always a plain “killed”.

snarky · June 24, 2019, 1:14am

Yes, you’re running out of memory. The kernel OOM killer chooses your training process to kill.

In general, swap is never a good solution – modern Linux systems often choose to run without it entirely. (In fact, Kubernetes will even fail to start a node if swap is enabled!)

If you try to run a workload that’s bigger than the available RAM, it will fail. Get a bigger computer, or make the workload smaller.

mme_ch · June 24, 2019, 4:28am

I guess I will switch to a bigger computer.

Thank you to all of you for your help.

arne.caspari · June 24, 2019, 2:53pm

@snarky

In general, swap is a good solution and you should not disable it. Kubernetes will refuse to start a node if swap is enabled only because it is too difficult for the devs to handle the situation with swap correctly ( eg: handling memory limits correctly). For a normal machine, disabling swap does not make any sense.

Here is why having swap makes sense: In defence of swap: common misconceptions

What you do not want is swap thrashing but this is not what the OP complained about. Instead, he might even get away with increasing the swap space and see if this allows his script to run through.

snarky · June 24, 2019, 5:42pm

I have done systems programming for 25 years, and I can tell you: Swap is bad.
The sales people from the early computer era were right: “Virtual memory is a way to sell real memory.”

For any system where you need to guarantee performance and behavior, swap adds unacceptable uncontrollable factors.

Note that virtual memory mapping is great! Similarly, demand paging of position-independent shared libraries may be acceptable, depending on your particular performance needs. But that’s not the same thing as swap; actually paging dirty memory to disk.

The Linux kernel will already overcommit on memory, and if it turns out you ACTUALLY need all the memory it “promised” to you, it will kill you … or some other process. Whichever process the OOM killer decides to kill. If you have a system that needs to actually provide defined services to defined customers, like almost every server and embedded system on the planet, any uncertainty about this process is just bad, period.

linuxdev · June 24, 2019, 8:11pm

I too prefer to run completely without swap. However, I cheat and have a swap file or partition (the capability to swap) most of the time. I only enable it when mandatory. Someone may have a situation where they know they will need this, but if that isn’t the case, it is best to avoid having actual enabled swap.

mdegans · June 24, 2019, 8:40pm

I have swap on but turn vm.swappiness way down on my devices with flash memory. @arne.caspari’s link says you should use a high value, but that will wear the device faster and I’m not sure if you even gain any performance from it given that even a very fast ssd is still much slower than ram. It seems like bad advice to use 100 as swappiness for most workloads. 10 is the value Red Hat recommends for database workloads and 60 is the default most distros use.