在完成阶段性服务部署,重启服务器后发现K8s集群的coredns组件状态异常无法正常运行。
通过查看状态得知存在OOMKilled
及CrashLoopBackOff
状态,并且日志报错内存溢出。
[root@lolicp ~]# kubectl get pod -o wide --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system coredns-68978987c9-ptsqn 0/1 OOMKilled 10 20h 10.244.0.29 node1 <none> <none>
kube-system coredns-68978987c9-zddr4 0/1 CrashLoopBackOff 6 12m 10.244.1.43 node2 <none> <none>
[root@lolicp ~]# cat /var/log/messages |grep coredns
Sep 1 14:17:15 hxb-node2 kernel: coredns invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=-997
Sep 1 14:17:15 hxb-node2 kernel: coredns cpuset=3db4acabd1dc41206be05ebc1b77efe9fe5e68d1acc97eda0f15dabbaa16043d mems_allowed=0
Sep 1 14:17:15 hxb-node2 kernel: CPU: 11 PID: 64322 Comm: coredns Tainted: G ------------ T 3.10.0-693.el7.x86_64 #1
Sep 1 14:17:15 hxb-node2 kernel: [64295] 0 64295 221708 43746 120 0 -997 coredns
Sep 1 14:17:15 hxb-node2 kernel: Memory cgroup out of memory: Kill process 64538 (coredns) score 22 or sacrifice child
Sep 1 14:17:15 hxb-node2 kernel: Killed process 64295 (coredns) total-vm:886832kB, anon-rss:172348kB, file-rss:2636kB, shmem-rss:0kB
Sep 1 14:17:16 hxb-node2 kubelet: I0901 14:17:16.342851 840 kubelet.go:1970] "SyncLoop (PLEG): event for pod" pod="kube-system/coredns-68978987c9-zddr4" event=&{ID:4d43ef12-ab28-49d8-be50-a33b85baf689 Type:ContainerDied Data:3db4acabd1dc41206be05ebc1b77efe9fe5e68d1acc97eda0f15dabbaa16043d}
Sep 1 14:17:16 hxb-node2 kubelet: E0901 14:17:16.343489 840 pod_workers.go:190] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"coredns\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=coredns pod=coredns-68978987c9-zddr4_kube-system(4d43ef12-ab28-49d8-be50-a33b85baf689)\"" pod="kube-system/coredns-68978987c9-zddr4" podUID=4d43ef12-ab28-49d8-be50-a33b85baf689
解决办法
因coredns组件的内存使用超出集群限制(或服务器内存资源不足),导致被系统kill掉进程,从而导致服务启动失败。
修改内存限制
增大coredns
组件的内存限制,由170Mi修改为270Mi(具体数值请根据实际情况修改)。
[root@lolicp ~]# kubectl edit deploy -n kube-system coredns
resources:
limits:
memory: 270Mi
重启coredns服务
[root@lolicp ~]# kubectl get pod -o wide -n kube-system -l k8s-app=kube-dns|awk 'NR>1 {print $1}'|xargs -i kubectl delete pod -n kube-system {}