It’s part of a K3s cluster, using containerd 1.7.27 as the container run time, and the nvidia container toolkit.
Most of my pods are working correctly, I’d like to use Azure Monitor though, and some of my pods report that.
kubectl logs ama-logs-rrjdh -n kube-system
Defaulted container "addon-token-adapter" out of: addon-token-adapter, ama-logs, ama-logs-prometheus
MICROSOFT SOFTWARE LICENSE TERMS
MICROSOFT Azure Arc for Kubernetes
__________________________________
This software is licensed to you as part of your or your company's subscription license for Microsoft Azure Services. You may only use the software with Microsoft Azure Services and subject to the terms and conditions of the agreement under which you obtained Microsoft Azure Services. If you do not have an active subscription license for Microsoft Azure Services, you may not use the software. Microsoft Azure Legal Information: https://azure.microsoft.com/en-us/support/legal/
F1030 15:26:27.503782 1 main.go:61] error modifying iptable rules: error adding rules to custom chain: running [/usr/sbin/iptables -t nat -A aad-metadata -p tcp -d 169.254.169.254 --dport 80 -j DNAT --to-destination 127.0.0.1:8421 --wait]: exit status 4: Warning: Extension tcp revision 0 not supported, missing kernel module?
Warning: Extension DNAT revision 0 not supported, missing kernel module?
iptables v1.8.10 (nf_tables): RULE_APPEND failed (No such file or directory): rule in chain aad-metadata
Would you have any fix on that? I’ve seen there were similar issues with Docker that got solved, but couldn’t find anything with containerd.
*** Please note that this reply is generated by LLM automatically ***
Based on the provided context, I’ll attempt to provide a comprehensive answer to the question.
The question is about an issue with Azure Monitor on a Kubernetes (K3s) cluster using containerd as the container runtime on a Jetson AGX Orin Devkit running Jetpack 6.2.1. The user is experiencing issues with some pods reporting errors related to Azure Monitor, and they are looking for a solution.
After reviewing the provided context, I found some relevant information that might be helpful in resolving the issue:
Similar issues with Docker: The user mentioned that similar issues with Docker were solved, but they couldn’t find anything related to containerd. This suggests that the issue might be related to the container runtime, and a solution might be found by comparing the Docker and containerd configurations.
NVIDIA Container Toolkit: The user is using the NVIDIA Container Toolkit, which is a set of tools for building and running GPU-accelerated containers. This toolkit might be relevant to the issue, as it provides a way to manage GPU resources and optimize container performance.
Containerd version: The user is using containerd 1.7.2, which is a relatively recent version. However, it’s possible that there are known issues or bugs in this version that could be contributing to the problem.
To troubleshoot the issue, I would recommend the following steps:
Check the containerd logs: Review the containerd logs to see if there are any error messages related to Azure Monitor or the NVIDIA Container Toolkit.
Compare Docker and containerd configurations: Compare the Docker and containerd configurations to see if there are any differences that could be contributing to the issue.
Check the NVIDIA Container Toolkit documentation: Review the NVIDIA Container Toolkit documentation to see if there are any known issues or limitations related to Azure Monitor or containerd.
Try a different container runtime: If possible, try using a different container runtime, such as Docker, to see if the issue persists.
Check the K3s cluster configuration: Review the K3s cluster configuration to ensure that it is properly configured and that there are no issues with the cluster that could be contributing to the problem.
In terms of a specific fix, I couldn’t find any information in the provided context that would suggest a straightforward solution. However, by following the troubleshooting steps outlined above, the user may be able to identify the root cause of the issue and find a solution.
It’s also worth noting that the user may want to consider reaching out to NVIDIA support or the K3s community for further assistance, as they may have more expertise and knowledge about the specific issue.
*** Please give the thumbs up if you get answers you like. Or provide feedback to help us improve the answer. ***
I’ve already opened a ticket on Azure, to have their POV on that issue.
I’ve already tried those steps.
Note: I’ve spent some time doing my research to troubleshoot this issue, reading plenty of docs, and the forum, not sure if replacing an LLM by another one is the best shot at solving that…
Yes, this is a known issue as the nf_tables doesn’t work on JetPack 6.2.
We try to fallback the iptables to legacy but fail. So need to disable the servicelb in the end.
I tried that, and I’m still getting the same error, do you kow if it works with containerd as a container runtime? Only the svclb-traefik is not there anymore.